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Preface 


This book provides a comprehensive introduction into the emerging field 
of probabilistic robotics. Probabilistic robotics is a subfield of robotics con- 
cerned with perception and control. It relies on statistical techniques for 
representing information and making decisions. By doing so, it accommo- 
dates the uncertainty that arises in most contemporary robotics applications. 
In recent years, probabilistic techniques have become one of the dominant 
paradigms for algorithm design in robotics. This monograph provides a first 
comprehensive introduction into some of the major techniques in this field. 

This book has a strong focus on algorithms. All algorithms in this book 
are based on a single overarching mathematical foundation: Bayes rule, and 
its temporal extension known as Bayes filters. This unifying mathematical 
framework is the core commonality of probabilistic algorithms. 

In writing this book, we have tried to be as complete as possible with re- 
gards to technical detail. Each chapter describes one or more major algo- 
rithms. For each algorithm, we provide the following four things: (1) an ex- 
ample implementation in pseudo code; (2) a complete mathematical deriva- 
tion from first principles that makes the various assumptions behind each al- 
gorithm explicit; (3) empirical results insofar as they further the understand- 
ing of the algorithms presented in the book; and (4) a detailed discussion of 
the strengths and weaknesses of each algorithm—from a practitioner’s per- 
spective. Developing all this for many different algorithms proved to be a 
laborious task. The result might at times be a bit difficult to digest for the 
casual reader—although skipping the mathematical derivation sections is al- 
ways an option! We hope that a careful reader emerges with a much deeper 
level of understanding than any superficial, non-mathematical exposition of 
this topic would have been able to convey. 
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Preface 


This book is the result of more than a decade of research by us, the authors, 
our students, and many of our colleagues in the field. We began writing it 
in 1999, hoping that it would take not much more than a few months to 
complete this book. Five years have passed, and almost nothing from the 
original draft has survived. Through working on this book, we have learned 
much more about information and decision theory than we thought we ever 
would. We are happy to report that much of what we learned has made it 
into this book. 

This monograph is written for students, researchers, and practitioners in 
robotics. We believe everybody building robots has to develop software. 
Hence the material in this book should be relevant to every roboticist. It 
should also be of interest to applied statisticians, and people concerned with 
real-world sensor data outside the realm of robotics. To serve a wide range 
of readers with varying technical backgrounds, we have attempted to make 
this book as self-contained as possible. Some prior knowledge of linear alge- 
bra and basic probability and statistics will be helpful, but we have included 
a primer for the basic laws of probability, and avoided the use of advanced 
mathematical techniques throughout this text. 

This book is also written for classroom use. Each chapter offers a number 
of exercises and suggests hands-on projects. When used in the classroom, 
each chapter should be covered in one or two lectures. Chapters should be 
skipped or reordered quite arbitrarily; in fact, in our own teaching we usually 
start right in the middle of the book, with Chapter 7. We recommend that the 
study of the book be accompanied by practical, hands-on experimentation as 
directed by the exercises at the end of each chapter. Nothing more important 
in robotics than doing it yourself! 

Despite our very best efforts, we believe there will still be techni- 
cal errors left in this book. Many of these errors have been corrected 
in this third printing of the book. We continue to post corrections on 
the book’s Web site, along with other materials relevant to this book: 

www.probabilistic-robotics.org 
We hope you enjoy this book! 


Sebastian Thrun 
Wolfram Burgard 
Dieter Fox 
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1.1 


Introduction 


Uncertainty in Robotics 


Robotics is the science of perceiving and manipulating the physical world 
through computer-controlled devices. Examples of successful robotic sys- 
tems include mobile platforms for planetary exploration, industrial robotics 
arms in assembly lines, cars that travel by themselves, and manipulators that 
assist surgeons. Robotics systems are situated in the physical world, perceive 
information on their environments through sensors, and manipulate through 
physical forces. 

While much of robotics is still in its infancy, the idea of "intelligent" manip- 
ulating devices has an enormous potential to change society. Wouldn't it be 
great if all our cars were able to safely steer themselves, making car accidents 
a notion of the past? Wouldn't it be great if robots, and not people, would 
clean up nuclear disaster sites like Chernobyl? Wouldn't it be great if our 
homes were populated by intelligent assistants that take care of all domestic 
repair and maintenance tasks? 

To do these tasks, robots have to be able to accommodate the enormous 
uncertainty that exists in the physical world. There is a number of factors 
that contribute to a robot's uncertainty. 

First and foremost, robot environments are inherently unpredictable. While 
the degree of uncertainty in well-structured environments such as assembly 
lines is small, environments such as highways and private homes are highly 
dynamic and in many ways highly unpredictable. The uncertainty is partic- 
ularly high for robots operating in the proximity of people. 

Sensors are limited in what they can perceive. Limitations arise from sev- 
eral factors. The range and resolution of a sensor is subject to physical limi- 
tations. For example, cameras cannot see through walls, and the spatial res- 
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olution of a camera image is limited. Sensors are also subject to noise, which 
perturbs sensor measurements in unpredictable ways and hence limits the 
information that can be extracted. And finally, sensors can break. Detecting 
a faulty sensor can be extremely difficult. 

Robot actuation involves motors that are, at least to some extent, unpre- 
dictable. Uncertainty arises from effects like control noise, wear-and-tear, 
and mechanical failure. Some actuators, such as heavy-duty industrial robot 
arms, are quite accurate and reliable. Others, like low-cost mobile robots, can 
be extremely flaky. 

Some uncertainty is caused by the robot's software. All internal models 
of the world are approximate. Models are abstractions of the real world. 
As such, they only partially model the underlying physical processes of the 
robot and its environment. Model errors are a source of uncertainty that has 
often been ignored in robotics, despite the fact that most robotic models used 
in state-of-the-art robotics systems are rather crude. 

Uncertainty is further created through algorithmic approximations. Robots 
are real-time systems. This limits the amount of computation that can be 
carried out. Many popular algorithms are approximate, achieving timely 
response through sacrificing accuracy. 

The level of uncertainty depends on the application domain. In some 
robotic applications, such as assembly lines, humans can cleverly engineer 
the system so that uncertainty is only a marginal factor. In contrast, robots 
operating in residential homes or on other planets will have to cope with sub- 
stantial uncertainty. Such robots are forced to act even though neither their 
sensors, nor their internal models, will provide it with sufficient informa- 
tion to make the right decisions with absolute certainty. As robotics is now 
moving into the open world, the issue of uncertainty has become a major 
stumbling block for the design of capable robot systems. Managing uncer- 
tainty is possibly the most important step towards robust real-world robot 
systems. 

Hence this book. 


Probabilistic Robotics 


This book provides a comprehensive overview of probabilistic robotics. Prob- 
abilistic robotics is a relatively new approach to robotics that pays tribute to 
the uncertainty in robot perception and action. The key idea in probabilistic 
robotics is to represent uncertainty explicitly using the calculus of probability 
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theory. Put differently, instead of relying on a single "best guess" as to what 
might be the case, probabilistic algorithms represent information by prob- 
ability distributions over a whole space of guesses. By doing so, they can 
represent ambiguity and degree of belief in a mathematically sound way. 
Control choices can be made robust relative the uncertainty that remains, 
and probabilistic robotics can even actively chose to reduce their uncertainty 
when this appears to be the superior choice. Thus, probabilistic algorithms 
degrade gracefully in the face of uncertainty. As a result, they outperform 
alternative techniques in many real-world applications. 

We shall illustrate probabilistic robotics with two motivating examples: 
one pertaining to robot perception, and another to planning and control. 

Our first example is mobile robot localization. Robot localization is the 
problem of estimating a robot's coordinates relative to an external reference 
frame. The robot is given a map of its environment, but to localize itself rela- 
tive to this map it needs to consult its sensor data. Figure 1.1 illustrates such 
a situation. The environment is known to possess three indistinguishable 
doors. The task of the robot is to find out where it is, through sensing and 
motion. 

This specific localization problem is known as global localization. In global 
localization, a robot is placed somewhere in a known environment and has to 
localize itself from scratch. The probabilistic paradigm represents the robot's 
momentary belief by a probability density function over the space of all lo- 
cations. This is illustrated in diagram (a) in Figure 1.1. This diagram shows 
a uniform distribution over all locations. Now suppose the robot takes a 
first sensor measurement and observes that it is next to a door. Probabilis- 
tic techniques exploit this information to update the belief. The ‘posterior’ 
belief is shown in diagram (b) in Figure 1.1. It places an increased proba- 
bility at places near doors, and lower probability near walls. Notice that this 
distribution possesses three peaks, each corresponding to one of the indistin- 
guishable doors in the environment. Thus, by no means does the robot know 
where it is. Instead, it now has three, distinct hypotheses which are each 
equally plausible given the sensor data. We also note that the robot assigns 
positive probability to places not next to a door. This is the natural result 
of the inherent uncertainty in sensing: With a small, non-zero probability, 
the robot might have erred in its assessment of seeing a door. The ability to 
maintain low-probability hypotheses is essential for attaining robustness. 

Now suppose the robot moves. Diagram (c) in Figure 1.1 shows the effect 
on a robot's belief. The belief has been shifted in the direction of motion. 
It also possesses a larger spread, which reflects the uncertainty that is intro- 
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(d) 


(e) 


Figure 1.1 The basic idea of Markov localization: A mobile robot during global local- 
ization. Markov localization techniques will be investigated in Chapters 7 and 8. 
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Figure 1.2. Top image: a robot navigating through open, featureless space may lose 
track of where it is. Bottom: This can be avoided by staying near known obstacles. 
These figures are results of an algorithm called coastal navigation, which will be dis- 
cussed in Chapter 16. Images courtesy of Nicholas Roy, MIT. 


Srjs.cn QOOOOO 


BAYES FILTER 


COASTAL NAVIGATION 


1 Introduction 


duced by robot motion. Diagram (d) in Figure 1.1 depicts the belief after 
observing another door. This observation leads our algorithm to place most 
of the probability mass on a location near one of the doors, and the robot is 
now quite confident as to where it is. Finally, Diagram (e) shows a belief as 
the robot travels further down the corridor. 

This example illustrates many aspects of the probabilistic paradigm. 
Stated probabilistically, the robot perception problem is a state estimation 
problem, and our localization example uses an algorithm known as Bayes 
filter for posterior estimation over the space of robot locations. The repre- 
sentation of information is a probability density function. The update of this 
function represents the information gained through sensor measurements, 
or the information lost through processes in the world that increase a robot's 
uncertainty. 

Our second example brings us into the realm of robotic planning and con- 
trol. As just argued, probabilistic algorithms can compute a robot’s momen- 
tary uncertainty. But they can also anticipate future uncertainty, and take 
such uncertainty into consideration when determining the right choice of 
control. One such algorithm is called coastal navigation. An example of coastal 
navigation is shown in Figure 1.2. This figure shows a 2-D map of an actual 
building. The top diagram compares an estimated path with an actual path: 
The divergence is the result of the uncertainty in robot motion that we just 
discussed. The interesting insight is: not all trajectories induce the same level 
of uncertainty. The path in Figure 1.2a leads through relatively open space, 
deprived of features that could help the robot to remain localized. Figure 1.2b 
shows an alternative path. This trajectory seeks a distinct corner, and then 
"hugs" the wall so as to stay localized. Not surprisingly, the uncertainty will 
be reduced for the latter path, hence chances of arriving at the goal location 
are actually higher. 

This example illustrates one of the many ways proper consideration of 
uncertainty affects the robot's controls. In our example, the anticipation of 
possible uncertainty along one trajectory makes the robot prefer a second, 
longer path, just so as to reduce the uncertainty. The new path is better, in 
the sense that the robot has a much higher chance of actually being at the 
goal when believing that it is. In fact, the second path is an example of active 
information gathering. The robot has, through its probabilistic considera- 
tion, determined that the best choice of action is to seek information along 
its path, in its pursuit to reach a target location. Probabilistic planning tech- 
niques anticipate uncertainty and can plan for information gathering, and 
probabilistic control techniques realize the results of such plans. 
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Implications 


Probabilistic robotics seamlessly integrates models with sensor data, over- 
coming the limitations of both at the same time. These ideas are not just a 
matter of low-level control. They cut across all levels of robotic software, 
from the lowest to the highest. 

In contrast with traditional programming techniques in robotics—such 
as model-based motion planning techniques or reactive behavior-based 
approaches—probabilistic approaches tend to be more robust in the face of 
sensor limitations and model limitations. This enables them to scale much 
better to complex real-world environments than previous paradigms, where 
uncertainty is of even greater importance. In fact, certain probabilistic al- 
gorithms are currently the only known working solutions to hard robotic 
estimation problems, such as the localization problem discussed a few pages 
ago, or the problem of building accurate maps of very large environments. 

In comparison to traditional model-based robotic techniques, probabilistic 
algorithms have weaker requirements on the accuracy of the robot’s models, 
thereby relieving the programmer from the insurmountable burden to come 
up with accurate models. Probabilistic algorithms have weaker requirements 
on the accuracy of robotic sensors than those made by many reactive tech- 
niques, whose sole control input is the momentary sensor input. Viewed 
probabilistically, the robot learning problem is a long-term estimation problem. 
Thus, probabilistic algorithms provide a sound methodology for many fla- 
vors of robot learning. 

However, these advantages come at a price. The two most frequently cited 
limitations of probabilistic algorithms are computational complexity, and a need 
to approximate. Probabilistic algorithms are inherently less efficient than their 
non-probabilistic counterparts. This is due to the fact that they consider en- 
tire probability densities instead of a single guess. The need to approximate 
arises from the fact that most robot worlds are continuous. Computing exact 
posterior distributions tends to be computationally intractable. Sometimes, 
one is fortunate in that the uncertainty can be approximated tightly with a 
compact parametric model (e.g., Gaussians). In other cases, such approxima- 
tions are too crude to be of use, and more complicated representations must 
be employed. 

Recent developments in computer hardware has made an unprecedented 
number of FLOPS available at bargain prices. This development has cer- 
tainly aided the field of probabilistic robotics. Further, recent research has 
successfully increased the computational efficiency of probabilistic algo- 
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rithms, for a range of hard robotics problems—many of which are described 
in depth in this book. Nevertheless, computational challenges remain. We 
shall revisit this discussion at numerous places, where we investigate the 
strengths and weaknesses of specific probabilistic solutions. 


Road Map 


This book is organized in four major parts. 


* Chapters 2 through 4 introduce the basic mathematical framework that 
underlies all of the algorithms described in this book, along with key al- 
gorithms. These chapters are the mathematical foundation of this book. 


* Chapters 5 and 6 present probabilistic models of mobile robots. In 
many ways, these chapters are the probabilistic generalization of classi- 
cal robotics models. They form the robotic foundation for the material 
that follows. 


* The mobile robot localization problem is discussed in Chapters 7 and 8. 
These chapters combine the basic estimation algorithms with the proba- 
bilistic models discussed in the previous two chapters. 


* Chapters 9 through 13 discuss the much richer problem of robotic map- 
ping. As before, they are all based on the algorithms discussed in the 
foundational chapters, but many of them utilize tricks to accommodate 
the enormous complexity of this problem. 


* Problems of probabilistic planning and control are discussed in Chap- 
ters 14 through 17. Here we begin by introducing a number of fundamen- 
tal techniques, and then branch into practical algorithms for controlling a 
robot probabilistically. The final chapter, Chapter 17, discusses the prob- 
lem of robot exploration from a probabilistic perspective. 


The book is best read in order, from the beginning to the end. However, we 
have attempted to make each individual chapter self-explanatory. Frequent 
sections called “Mathematical Derivation of ...” can safely be skipped on first 
reading without compromising the coherence of the overall material in this 
book. 
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Teaching Probabilistic Robotics 


When used in the classroom, we do not recommend to teach the chapters 
in order—unless the students have an unusually strong appreciation of ab- 
stract mathematical concepts. Particle filters are easier to teach than Gaussian 
filters, and students tend to get more excited by mobile robot localization 
problems than abstract filter algorithms. In our own teachings, we usually 
begin with Chapter 2, and move directly to Chapters 7 and 8. While teaching 
localization, we go back to the material in Chapters 3 through 6 as needed. 
We also teach Chapter 14 early, to expose students to the problems related to 
planning and control early on in a course. 

As a teacher, feel free to use slides and animations from the book's Web site 

www.probabilistic-robotics.org 
to illustrate the various algorithms in this book. And feel free to send us, the 
authors, pointers to your class Web sites and any material that could help 
others in teaching Probabilistic Robotics. 

The material in this book is best taught with hands-on implementation 
assignments. There is nothing more educational in robotics than program- 
ming an actual robot. And nobody can explain the pitfalls and challenges in 
robotics better than Nature! 


Bibliographical Remarks 


The field of robotics has gone through a series of paradigms for software design. The first major 
paradigm emerged in the mid-1970s, and is known as the model-based paradigm. The model- 
based paradigm began with a number of studies showing the hardness of controlling a high- 
DOF robotic manipulator in continuous spaces (Reif 1979). It culminated in text like Schwartz 
et al.’s (1987) analysis of the complexity of robot motion, a first singly exponential general mo- 
tion planning algorithm by Canny (1987), and Latombe's (1991) seminal introductory text into 
the field of model-based motion planning (additional milestone contributions will be discussed 
in Chapter 14). This early work largely ignored the problem of uncertainty—even though it 
extensively began using randomization as a technique for solving hard motion planning prob- 
lems (Kavraki et al. 1996). Instead, the assumption was that a full and accurate model of the 
robot and the environment be given, and the robot be deterministic. The model had to be suf- 
ficiently accurate that the residual uncertainty was managed by a low-level motion controller. 
Most motion planning techniques simply produced a single reference trajectory for the control 
of a manipulator, although ideas such as potential fields (Khatib 1986) and navigation functions 
(Koditschek 1987) provided mechanisms for reacting to the unforeseen—as long as it could be 
sensed. Applications of these early techniques, if any, were confined to environments where 
every little bit of uncertainty could be engineered away, or sensed with sufficient accuracy. 

The field took a radical shift in the mid-1980s, when the lack of sensory feedback became the 
focus of an entire community of researchers within robotics. With strong convictions, the field 
of behavior-based robotics rejected the idea of any internal model. Instead, it was the interaction 
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with a physical environment of a situated agent (Kaelbling and Rosenschein 1991) that created 
the complexity in robot motion (a phenomena often called emergent behavior (Steels 1991)). Con- 
sequently, sensing played a paramount role, and internal models were rejected (Brooks 1990). 

The enthusiasm in this field was fueled by some early successes that were far beyond the 
reach of traditional model-based motion planning algorithms. One of them was “Genghis,” 
a hexapod robot developed by Brooks (1986). A relatively simple finite state automaton was 
able to control the gait of this robot even in rugged terrain. The key to success of such tech- 
niques lay in sensing: the control was entirely driven by environment interaction, as perceived 
through the robot’s sensors. Some of the early work impressed by creating a seemingly complex 
robot through clever exploitation of environment feedback (Connell 1990). More recently, the 
paradigm enjoyed commercial success through a robotic vacuum cleaning robot (IRobots Inc. 
2004), whose software follows the behavior-based paradigm. 

Due to the lack of internal models and a focus on simple control mechanism, most robot sys- 
tems were confined to relatively simple tasks, where the momentary sensor information was 
sufficient to determine the right choice of control. Recognizing this limitation, more recent 
work in this field embraced hybrid control architectures (Arkin 1998), in which behavior-based 
technique provided low-level control, whereas a model-based planner coordinated the robot's 
actions at a high, abstract level. Such hybrid architectures are commonplace in robotics today. 
They are not dissimilar to the seminal work on three-layered architectures by Gat (1998), which 
took its origins in “Shakey the Robot” (Nilsson 1984). 

Modern probabilistic robotics has emerged since the mid-1990s, although its roots can be 
traced back to the invention of the Kalman filter (Kalman 1960). In many ways, probabilistic 
robotics falls in between model-based and behavior-based techniques. In probabilistic robotics, 
there are models, but they are assumed to be incomplete and insufficient for control. There are 
also sensor measurements, but they too are assumed to be incomplete and insufficient for con- 
trol. Through the integration of both, models and sensor measurements, a control action can 
be devised. Statistics provides the mathematical glue to integrate models and sensor measure- 
ments. 

Many of the key advances in the field of probabilistic robotics will be discussed in future 
chapters. Some of the cornerstones in this field include the advent of Kalman filter techniques 
for high-dimensional perception problems by Smith and Cheeseman (1986), the invention of 
occupancy grid maps by (Elfes 1987; Moravec 1988), and the re-introduction of partially observ- 
able planning techniques due to Kaelbling et al. (1998). The past decade has seen an explosion 
of techniques: Particle filters have become vastly popular (Dellaert et al. 1999), and researchers 
have developed new programming methodologies focused on Bayesian information processing 
(Thrun 2000b; Lebeltel et al. 2004; Park et al. 2005). This development went hand in hand with 
the deployment of physical robot systems driven by probabilistic algorithms, such as industrial 
machines for cargo handling by Durrant-Whyte (1996), entertainment robots in museums (Bur- 
gard et al. 1999a; Thrun et al. 2000a; Siegwart et al. 2003), and robots in nursing and health care 
(Pineau et al. 2003d). An open-source software package for mobile robot control that heavily 
utilizes probabilistic techniques is described in Montemerlo et al. (2003a). 

The field of commercial robotics is also at a turning point. In its annual World Robotics Sur- 
vey, the United Nations and the International Federation of Robotics 2004 finds a 19% annual increase 
in the size of the robotic market worldwide. Even more spectacular is the change of market seg- 
ments, which indicates a solid transition from industrial applications to service robotics and 
consumer products. 


Srjs.cn 000000 


2.1 


Recursive State Estimation 


Introduction 


At the core of probabilistic robotics is the idea of estimating state from sen- 
sor data. State estimation addresses the problem of estimating quantities 
from sensor data that are not directly observable, but that can be inferred. 
In most robotic applications, determining what to do is relatively easy if one 
only knew certain quantities. For example, moving a mobile robot is rel- 
atively easy if the exact location of the robot and all nearby obstacles are 
known. Unfortunately, these variables are not directly measurable. Instead, 
a robot has to rely on its sensors to gather this information. Sensors carry 
only partial information about those quantities, and their measurements are 
corrupted by noise. State estimation seeks to recover state variables from the 
data. Probabilistic state estimation algorithms compute belief distributions 
over possible world states. An example of probabilistic state estimation was 
already encountered in the introduction to this book: mobile robot localiza- 
tion. 

The goal of this chapter is to introduce the basic vocabulary and mathe- 
matical tools for estimating state from sensor data. 


* Chapter 2.2 introduces basic probabilistic concepts used throughout the 
book. 


* Chapter 2.3 describes our formal model of robot environment interaction, 
setting forth some of the key terminology used throughout the book. 


* Chapter 2.4 introduces Bayes filters, the recursive algorithm for state esti- 
mation that forms the basis of virtually every technique presented in this 
book. 
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* Chapter 2.5 discusses representational and computational issues that arise 
when implementing Bayes filters. 


Basic Concepts in Probability 


This section familiarizes the reader with the basic notation and probabilistic 
facts used throughout the book. In probabilistic robotics, quantities such as 
sensor measurements, controls, and the states of a robot and its environment 
are all modeled as random variables. Random variables can take on multiple 
values, and they do so according to specific probabilistic laws. Probabilistic 
inference is the process of calculating these laws for random variables that 
are derived from other random variables and the observed data. 

Let X denote a random variable and x denote a specific value that X might 
assume. A standard example of a random variable is a coin flip, where X can 
take on the values heads or tails. If the space of all values that X can take on 
is discrete, as is the case if X is the outcome of a coin flip, we write 


to denote the probability that the random variable X has value x. For exam- 
ple, a fair coin is characterized by p(X = head) = p(X = tail) = 4. Discrete 
probabilities sum to one, that is, 


Š ox =r) = 1 








Probabilities are always non-negative, that is, p(X = x) > 0. 

To simplify the notation, we will usually omit explicit mention of the ran- 
dom variable whenever possible, and instead use the common abbreviation 
p(x) instead of writing p(X = x). 

Most techniques in this book address estimation and decision making in 
continuous spaces. Continuous spaces are characterized by random vari- 
ables that can take on a continuum of values. Unless explicitly stated, we 
assume that all continuous random variables possess probability density func- 
tions (PDFs). A common density function is that of the one-dimensional 
normal distribution with mean p and variance o?. The PDF of a normal distri- 
bution is given by the following Gaussian function: 


ple) = (iuo?) opf- 6-9 
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Normal distributions play a major role in this book. We will frequently ab- 
breviate them as N (z; u, o°), which specifies the random variable, its mean, 
and its variance. 

The Normal distribution (2.3) assumes that x is a scalar value. Often, x will 
be a multi-dimensional vector. Normal distributions over vectors are called 
multivariate. Multivariate normal distributions are characterized by density 
functions of the following form: 


p(z) = det(253) 3 exp(-i(z— n)" E^ (s — 1) 


Here u is the mean vector. X a positive semidefinite and symmetric matrix called 
the covariance matrix. The superscript " marks the transpose of a vector. The 
argument in the exponent in this PDF is quadratic in z, and the parameters 
of this quadratic function are p and X. 

The reader should take a moment to realize that Equation (2.4) is a strict 
generalization of Equation (2.3); both definitions are equivalent if x is a scalar 
value and X = 0°. 

Equations (2.3) and (2.4) are examples of PDFs. Just as discrete probability 
distribution always sums up to 1, a PDF always integrates to 1: 


fræ dx = 1 


However, unlike a discrete probability, the value of a PDF is not upper- 
bounded by 1. Throughout this book, we will use the terms probability, proba- 
bility density, and probability density function interchangeably. We will silently 
assume that all continuous random variables are measurable, and we also 
assume that all continuous distributions actually possess densities. 

The joint distribution of two random variables X and Y is given by 


p(z,y) = p(X =xandY =y) 


This expression describes the probability of the event that the random vari- 
able X takes on the value x and that Y takes on the value y. If X and Y are 
independent, we have 


p(zx,y) = p(x) ply) 


Often, random variables carry information about other random variables. 
Suppose we already know that Y’s value is y, and we would like to know the 
probability that X's value is z conditioned on that fact. Such a probability 
will be denoted 


p(x | y) p(X-—z|Y-wvy) 
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and is called conditional probability. If p(y) > 0, then the conditional probabil- 
ity is defined as 


P(x, y) 
p(y) 


If X and Y are independent, we have 





plz |y) = 


p(z) ply) 

Mey er A 
In other words, if X and Y are independent, Y tells us nothing about the 
value of X. There is no advantage of knowing the value of Y if we are in- 
terested in X. Independence, and its generalization known as conditional 
independence, plays a major role throughout this book. 

An interesting fact, which follows from the definition of conditional prob- 
ability and the axioms of probability measures, is often referred to as Theorem 
of total probability: 


3 p(x |y) ply) 


(discrete case) 


S. 
S, 
I 


J plx | y) ply) dy (continuous case) 


If p(x | y) or p(y) are zero, we define the product p(x | y) p(y) to be zero, 
regardless of the value of the remaining factor. 

Equally important is Bayes rule, which relates a conditional of the type p(z | 
y) to its “inverse,” p(y | x). The rule, as stated here, requires p(y) > 0: 








p(y | x) p(x) ply | x) p(x) : 
p(x | y) mo pu ee) (discrete) 
O wla _ WDA on 
人 


Bayes rule plays a predominant role in probabilistic robotics (and probabilis- 
tic inference in general). If x is a quantity that we would like to infer from y, 
the probability p(x) will be referred to as prior probability distribution, and y 
is called the data (e.g., a sensor measurement). The distribution p(x) summa- 
rizes the knowledge we have regarding X prior to incorporating the data y. 
The probability p(x | y) is called the posterior probability distribution over X. 
As (2.14) suggests, Bayes rule provides a convenient way to compute a pos- 
terior p(x | y) using the “inverse” conditional probability p(y | x) along with 
the prior probability p(x). In other words, if we are interested in inferring 
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a quantity x from sensor data y, Bayes rule allows us to do so through the 
inverse probability, which specifies the probability of data y assuming that 
x was the case. In robotics, the probability p(y | x) is often coined generative 
model, since it describes at some level of abstraction how state variables X 
cause sensor measurements Y. 

An important observation is that the denominator of Bayes rule, p(y), does 
not depend on x. Thus, the factor p(y)~' in Equations (2.13) and (2.14) will 
be the same for any value z in the posterior p(x | y). For this reason, p(y) ! 
is often written as a normalizer in Bayes rule variable, and generically denoted 


n: 
p(z|v) = nply| 2) p(x) 


The advantage of this notation lies in its brevity. Instead of explicitly provid- 
ing the exact formula for a normalization constant—which can grow large 
very quickly in some of the mathematical derivations—we simply will use 
the normalization symbol 7 to indicate that the final result has to be normal- 
ized to 1. Throughout this book, normalizers of this type will be denoted 7 
(or 7’, 7",...). Important: We will freely use the same r in different equations 
to denote normalizers, even if their actual values differ. 

We notice that it is perfectly fine to condition any of the rules discussed 
so far on arbitrary random variables, such as the variable Z. For example, 
conditioning Bayes rule on Z = z gives us: 


p(y | v, 2) plz | 2) 
p(y | 2) 





p(r|ywz) = 


as long as p(y | z) > 0. 
Similarly, we can condition the rule for combining probabilities of inde- 
pendent random variables (2.7) on other variables z: 
p(z,y|z) = plx|z) p(y 2) 
Such a relation is known as conditional independence. As the reader easily 
verifies, (2.17) is equivalent to 
pelz) = pel zy) 
ply | 2) p(y | zx) 


Conditional independence plays an important role in probabilistic robotics. 
It applies whenever a variable y carries no information about a variable x 
if another variable's value z is known. Conditional independence does not 
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imply (absolute) independence, that is, 


p(x,y|z) = p(v|z)p(v|z) v plz,y) = plz) ply) 


The converse is also in general untrue: absolute independence does not im- 
ply conditional independence: 


p(x,y) = p(x) ply) v plz,y|z) = »(x|2z) ply | 2) 


In special cases, however, conditional and absolute independence may coin- 
cide. 

A number of probabilistic algorithms require us to compute features, or 
statistics, of probability distributions. The expectation of a random variable 
X is given by 


5 x p(r) (discrete) 


ty 
Z 
l 


E[X] = jJ x p(x)dx (continuous) 


Not all random variables possess finite expectations; however, those that do 
not are of no relevance to the material presented in this book. 

The expectation is a linear function of a random variable. In particular, we 
have 


ElaX +b] = aE[X]-6 


for arbitrary numerical values a and b. The covariance of X is obtained as 
follows 


Cov[X] = E[X- E[X]] = E[X?]- Ex} 


The covariance measures the squared expected deviation from the mean. As 
stated above, the mean of a multivariate normal distribution N (x; u, X) is p, 
and its covariance is X. 

A final concept of importance in this book is entropy. The entropy of a 
probability distribution is given by the following expression: 


Hy(x) = E|- log p(z)] 
which resolves to 
H,(x) = 一 >》p(z) logyp(x) (discrete) 


= 

m 

Es 
I 


一 / p(x) log; p(x) dx (continuous) 
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Figure 2.1 Robot environment interaction. 


The concept of entropy originates in information theory. The entropy is 
the expected information that the value of x carries. In the discrete case, 
— log, p(x) is the number of bits required to encode x using an optimal en- 
coding, assuming that p(x) is the probability of observing x. In this book, 
entropy will be used in robotic information gathering, so as to express the 
information a robot may receive upon executing specific actions. 


Robot Environment Interaction 


Figure 2.1 illustrates the interaction of a robot with its environment. The 
environment, or world, is a dynamical system that possesses internal state. 
The robot can acquire information about its environment using its sensors. 
However, sensors are noisy, and there are usually many things that cannot 
be sensed directly. As a consequence, the robot maintains an internal belief 
with regards to the state of its environment, depicted on the left in this figure. 

The robot can also influence its environment through its actuators. The 
effect of doing so is often somewhat unpredictable. Hence, each control ac- 
tion affects both the environment state, and the robot’s internal belief with 
regards to this state. 

This interaction will now be described more formally. 
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State 


Environments are characterized by state. For the material presented in this 
book, it will be convenient to think of the state as the collection of all aspects 
of the robot and its environment that can impact the future. Certain state 
variables tend to change over time, such as the whereabouts of people in the 
vicinity of a robot. Others tend to remain static, such as the location of walls 
in (most) buildings. State that changes will be called dynamic state, which dis- 
tinguishes it from static state, or non-changing state. The state also includes 
variables regarding the robot itself, such as its pose, velocity, whether or not 
its sensors are functioning correctly, and so on. 

Throughout this book, state will be denoted x; although the specific vari- 
ables included in x will depend on the context. The state at time t will be 
denoted x+. Typical state variables used throughout this book are: 


* The robot pose, which comprises its location and orientation relative to a 
global coordinate frame. Rigid mobile robots possess six such state vari- 
ables, three for their Cartesian coordinates, and three for their angular 
orientation (pitch, roll, and yaw). For rigid mobile robots confined to pla- 
nar environments, the pose is usually given by three variables, its two 
location coordinates in the plane and its heading direction (yaw). 


* In robot manipulation, the pose includes variables for the configuration of 
the robot's actuators. For example, they might include the joint angles of 
revolute joints. Each degree of freedom in a robot arm is characterized by 
a one-dimensional configuration at any point in time, which is part of the 
kinematic state of the robot. The robot configuration is often referred to 
as kinematic state. 


* The robot velocity and the velocities of its joints are commonly referred to as 
dynamic state. A rigid robot moving through space is characterized by up 
to six velocity variables, one for each pose variables. Dynamic state will 
play only a minor role in this book. 


* The location and features of surrounding objects in the environment are also 
state variables. An object may be a tree, a wall, or a pixel within a larger 
surface. Features of such objects may be their visual appearance (color, 
texture). Depending on the granularity of the state that is being modeled, 
robot environments possess between a few dozen and up to hundreds of 
billions of state variables (and more). Just imagine how many bits it will 
take to accurately describe your physical environment! For many of the 
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problems studied in this book, the location of objects in the environment 
will be static. In some problems, objects will assume the form of land- 
marks, which are distinct, stationary features of the environment that can 
be recognized reliably. 


* The location and velocities of moving objects and people are also potential state 
variables. Often, the robot is not the only moving actor in its environment. 
Other moving entities possess their own kinematic and dynamic state. 


* There are many other state variables that may impact a robot's operation. 
For example, whether or not a sensor is broken can be a state variable, as 
can be the level of battery charge for a battery-powered robot. The list of 
potential state variables is endless! 


A state x; will be called complete if it is the best predictor of the future. Put dif- 
ferently, completeness entails that knowledge of past states, measurements, 
or controls carry no additional information that would help us predict the 
future more accurately. It is important to notice that our definition of com- 
pleteness does not require the future to be a deterministic function of the state. 
The future may be stochastic, but no variables prior to x, may influence 
the stochastic evolution of future states, unless this dependence is mediated 
through the state z;. Temporal processes that meet these conditions are com- 
monly known as Markov chains. 

The notion of state completeness is mostly of theoretical importance. In 
practice, it is impossible to specify a complete state for any realistic robot 
system. A complete state includes not just all aspects of the environment 
that may have an impact on the future, but also the robot itself, the content 
of its computer memory, the brain dumps of surrounding people, etc. Some 
of those are hard to obtain. Practical implementations therefore single out a 
small subset of all state variables, such as the ones listed above. Such a state 
is called incomplete state. 

In most robotics applications, the state is continuous, meaning that x, is 
defined over a continuum. A good example of a continuous state space is 
that of a robot pose, that is, its location and orientation relative to an external 
coordinate system. Sometimes, the state is discrete. An example of a discrete 
state space is the (binary) state variable that models whether or not a sensor 
is broken. State spaces that contain both continuous and discrete variables 
are called hybrid state spaces. 

In most cases of interesting robotics problems, state changes over time. 
Time, throughout this book, will be discrete, that is, all interesting events will 
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take place at discrete time steps t = 0,1,2.... If the robot starts its operation 
at a distinct point in time, we will denote this time as t = 0. 


Environment Interaction 


There are two fundamental types of interactions between a robot and its en- 
vironment: The robot can influence the state of its environment through its 
actuators, and it can gather information about the state through its sensors. 
Both types of interactions may co-occur, but for didactic reasons we will sep- 
arate them throughout this book. The interaction is illustrated in Figure 2.1. 


* Environment sensor measurements. Perception is the process by which 
the robot uses its sensors to obtain information about the state of its envi- 
ronment. For example, a robot might take a camera image, a range scan, 
or query its tactile sensors to receive information about the state of the 
environment. The result of such a perceptual interaction will be called 
a measurement, although we will sometimes also call it observation or per- 
cept. Typically, sensor measurements arrive with some delay. Hence they 
provide information about the state a few moments ago. 


* Control actions change the state of the world. They do so by actively 
asserting forces on the robot's environment. Examples of control actions 
include robot motion and the manipulation of objects. Even if the robot 
does not perform any action itself, state usually changes. Thus, for consis- 
tency, we will assume that the robot always executes a control action, even 
if it chooses not to move any of its motors. In practice, the robot continu- 
ously executes controls and measurements are made concurrently. 


Hypothetically, a robot may keep a record of all past sensor measurements 
and control actions. We will refer to such a collection as the data (regardless 
of whether they are being memorized or not). In accordance with the two 
types of environment interactions, the robot has access to two different data 
streams. 


* Environment measurement data provides information about a momen- 
tary state of the environment. Examples of measurement data include 
camera images, range scans, and so on. For most parts, we will simply 
ignore small timing effects (e.g., most laser sensors scan environments 
sequentially at very high speeds, but we will simply assume the measure- 
ment corresponds to a specific point in time). The measurement data at 
time t will be denoted z;. 
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Throughout most of this book, we simply assume that the robot takes ex- 
actly one measurement at a time. This assumption is mostly for notational 
convenience, as nearly all algorithms in this book can easily be extended 
to robots that can acquire variable numbers of measurements within a 
single time step. The notation 


Zio ty) Sty $1 t1 42» + + Mtg 


denotes the set of all measurements acquired from time t; to time t5, for 
ti X te. 


* Control data carry information about the change of state in the environ- 
ment. In mobile robotics, a typical example of control data is the velocity 
of a robot. Setting the velocity to 10 cm per second for the duration of 
five seconds suggests that the robot’s pose, after executing this motion 
command, is approximately 50 cm ahead of its pose before command ex- 
ecution. Thus, control conveys information regarding the change of state. 


An alternative source of control data are odometers. Odometers are sensors 
that measure the revolution of a robot’s wheels. As such they convey in- 
formation about the change of state. Even though odometers are sensors, 
we will treat odometry as control data, since they measure the effect of a 
control action. 


Control data will be denoted u+. The variable w will always correspond to 
the change of state in the time interval (t — 1; t]. As before, we will denote 
sequences of control data by ut, :ta, for tı < t»: 


Uty:tg — Uti, Ut 1; Ut4 2; Uto 


Since the environment may change even if a robot does not execute a 
specific control action, the fact that time passed by constitutes, techni- 
cally speaking, control information. We therefore assume that there is 
exactly one control data item per time step t, and include as legal action 
"do-nothing". 


The distinction between measurement and control is a crucial one, as both 
types of data play fundamentally different roles in the material yet to 
come. Environment perception provides information about the environ- 
ment's state, hence it tends to increase the robot's knowledge. Motion, on 
the other hand, tends to induce a loss of knowledge due to the inherent noise 
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in robot actuation and the stochasticity of robot environments. By no means 
is our distinction intended to suggest that actions and perceptions are sepa- 
rated in time. Rather, perception and control takes place concurrently. Our 
separation is strictly for convenience. 


Probabilistic Generative Laws 


The evolution of state and measurements is governed by probabilistic laws. 
In general, the state x; is generated stochastically from the state zr, ;. Thus, 
it makes sense to specify the probability distribution from which z+ is gener- 
ated. At first glance, the emergence of state x, might be conditioned on all 
past states, measurements, and controls. Hence, the probabilistic law char- 
acterizing the evolution of state might be given by a probability distribution 
of the following form: p(z; | %o:1-1, 214—1, U1). Notice that through no par- 
ticular motivation we assume here that the robot executes a control action u1 
first, and then takes a measurement 21. 

An important insight is the following: If the state x is complete then it is a 
sufficient summary of all that happened in previous time steps. In particular, 
14.1 is a sufficient statistic of all previous controls and measurements up to 
this point in time, that is, uj4 1 and 21-1. From all the variables in the 
expression above, only the control u; matters if we know the state x;_1. 

In probabilistic terms, this insight is expressed by the following equality: 


p(zi | 204—1; Z1:—1; Ut) = p(zi | Xt—1, Ut) 


The property expressed by this equality is an example of conditional indepen- 
dence. It states that certain variables are independent of others if one knows 
the values of a third group of variables, the conditioning variables. Con- 
ditional independence will be exploited pervasively in this book. It is the 
primary reason why many of the algorithms presented in the book are com- 
putationally tractable. 

One might also want to model the process by which measurements are 
being generated. Again, if x; is complete, we have an important conditional 
independence: 


plz | Go:t, Z14—1, U1:t) = plz | 24) 


In other words, the state x, is sufficient to predict the (potentially noisy) mea- 
surement z+. Knowledge of any other variable, such as past measurements, 
controls, or even past states, is irrelevant if x, is complete. 
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Figure 2.2 The dynamic Bayes network that characterizes the evolution of controls, 
states, and measurements. 


This discussion leaves open as to what the two resulting conditional prob- 
abilities are: p(x; | zi 1, ut) and p(z | x+). The probability p(x; | zii, ut) 
is the state transition probability. It specifies how environmental state evolves 
over time as a function of robot controls u;. Robot environments are sto- 
chastic, which is reflected by the fact that p(x; | x; 1, uz) is a probability 
distribution, not a deterministic function. Sometimes the state transition dis- 
tribution does not depend on the time index t, in which case we may write it 
as p(x’ | u, x), where z' is the successor and z the predecessor state. 

The probability p(z; | z;) is called the measurement probability. It also may 
not depend on the time index t, in which case it shall be written as p(z | 
x). The measurement probability specifies the probabilistic law according 
to which measurements z are generated from the environment state x. It is 
appropriate to think of measurements as noisy projections of the state. 

The state transition probability and the measurement probability together 
describe the dynamical stochastic system of the robot and its environment. 
Figure 2.2 illustrates the evolution of states and measurements, defined 
through those probabilities. The state at time t is stochastically dependent 
on the state at time t — 1 and the control u+. The measurement z; depends 
stochastically on the state at time t. Such a temporal generative model is also 
known as hidden Markov model (HMM) or dynamic Bayes network (DBN). 


Belief Distributions 


Another key concept in probabilistic robotics is that of a belief. A belief re- 
flects the robot's internal knowledge about the state of the environment. We 
already discussed that state cannot be measured directly. For example, a 
robot's pose might be zt = (14.12, 12.7, 45°) in some global coordinate sys- 


»js.on 000000 


20 


INFORMATION STATE 


(2.33) 


(2.34) 


PREDICTION 


2.4 


2.4.1 


BAYES FILTER 


2 Recursive State Estimation 


tem, but it usually cannot know its pose, since poses are not measurable 
directly (not even with GPS!). Instead, the robot must infer its pose from 
data. We therefore distinguish the true state from its internal belief with re- 
gards to that state. Synonyms for belief in the literature are the terms state 
of knowledge and information state (not to be confused with the information 
vector and information matrix discussed below). 

Probabilistic robotics represents beliefs through conditional probability 
distributions. A belief distribution assigns a probability (or density value) to 
each possible hypothesis with regards to the true state. Belief distributions 
are posterior probabilities over state variables conditioned on the available 
data. We will denote belief over a state variable x; by bel(a;), which is an 
abbreviation for the posterior 


bel(zi) = play | 21:2, 914) 


This posterior is the probability distribution over the state x, at time t, con- 
ditioned on all past measurements z1., and all past controls u;.,. 

The reader may notice that we silently assume that the belief is taken after 
incorporating the measurement z;. Occasionally, it will prove useful to cal- 
culate a posterior before incorporating z, just after executing the control uz. 
Such a posterior will be denoted as follows: 


bela) = p(T | 214-1, U1) 


This probability distribution is often referred to as prediction in the context of 
probabilistic filtering. This terminology reflects the fact that bel(z;) predicts 
the state at time ¢ based on the previous state posterior, before incorporating 
the measurement at time t. Calculating bel (x+) from bel (x+) is called correction 
or the measurement update. 


Bayes Filters 


The Bayes Filter Algorithm 


The most general algorithm for calculating beliefs is given by the Bayes filter 
algorithm. This algorithm calculates the belief distribution bel from measure- 
ment and control data. We will first state the basic algorithm and then will 
elucidate it with a numerical example. After that, we will derive it mathe- 
matically from the assumptions made so far. 

Table 2.1 depicts the basic Bayes filter in pseudo-algorithmic form. The 
Bayes filter is recursive, that is, the belief bel(z;) at time t is calculated from 
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Algorithm Bayes filter(bel(z, 1), we, zi): 
for all x, do 
bel(zi) = J p(we | ue, 2421) bel(zi-1) dzia 
bel(xe) = n p(t | £+) bel(ae) 
endfor 
return bel(x+) 











Table 2.1 The general algorithm for Bayes filtering. 


the belief bel(x,_1) at time t — 1. Its input is the belief bel at time t — 1, 
along with the most recent control u; and the most recent measurement z;. Its 
output is the belief bel(z;) at time t. Table 2.1 only depicts a single iteration 
of the Bayes Filter algorithm: the update rule. This update rule is applied 
recursively, to calculate the belief bel(x,) from the belief bel(z,..1), calculated 
previously. 

The Bayes filter algorithm possesses two essential steps. In line 3, it pro- 
cesses the control u. It does so by calculating a belief over the state Zi based 
on the prior belief over state r,..; and the control u+. In particular, the belief 
pel(zz) that the robot assigns to state x; is obtained by the integral (sum) of 
the product of two distributions: the prior assigned to x; .,, and the proba- 
bility that control u; induces a transition from z,. ; to x+. The reader may rec- 
ognize the similarity of this update step to Equation (2.12). As noted above, 
this update step is called the control update, or prediction. 

The second step of the Bayes filter is called the measurement update. In line 
4, the Bayes filter algorithm multiplies the belief bel(x;) by the probability 
that the measurement z, may have been observed. It does so for each hy- 
pothetical posterior state rz. As will become apparent further below when 
actually deriving the basic filter equations, the resulting product is generally 
not a probability. It may not integrate to 1. Hence, the result is normalized, 
by virtue of the normalization constant 7. This leads to the final belief bel (x+), 
which is returned in line 6 of the algorithm. 

To compute the posterior belief recursively, the algorithm requires an ini- 
tial belief bel(xo) at time t = 0 as boundary condition. If one knows the 
value of zo with certainty, bel(xo) should be initialized with a point mass 
distribution that centers all probability mass on the correct value of xo, and 
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Figure 2.3 A mobile robot estimating the state of a door. 


assigns zero probability anywhere else. If one is entirely ignorant about the 
initial value zo, bel(ao) may be initialized using a uniform distribution over 
the domain of xo (or a related distribution from the Dirichlet family of dis- 
tributions). Partial knowledge of the initial value x9 can be expressed by 
non-uniform distributions; however, the two cases of full knowledge and 
full ignorance are the most common ones in practice. 

The algorithm Bayes filter can only be implemented in the form stated here 
for very simple estimation problems. In particular, we either need to be able 
to carry out the integration in line 3 and the multiplication in line 4 in closed 
form, or we need to restrict ourselves to finite state spaces, so that the integral 
in line 3 becomes a (finite) sum. 


Example 


Our illustration of the Bayes filter algorithm is based on the scenario in Fig- 
ure 2.3, which shows a robot estimating the state of a door using its camera. 
To make this problem simple, let us assume that the door can be in one of 
two possible states, open or closed, and that only the robot can change the 
state of the door. Let us furthermore assume that the robot does not know 
the state of the door initially. Instead, it assigns equal prior probability to the 
two possible door states: 


bel(Xo = open) = 0.5 
bel(Xo = closed) = 0.5 


Let us now assume the robot's sensors are noisy. The noise is characterized 
by the following conditional probabilities: 


p(Z, = sense open| X, = is open) = 0.6 
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p(Zi = sense_closed | X, = is open) = 0.4 
and 

p(Z, = sense open| X, = is closed) = 0.2 
p(Z, = sense_closed | X, = is closed) = 0.8 


These probabilities suggest that the robot’s sensors are relatively reliable in 
detecting a closed door, in that the error probability is 0.2. However, when 
the door is open, it has a 0.4 probability of an erroneous measurement. 

Finally, let us assume the robot uses its manipulator to push the door open. 
If the door is already open, it will remain open. If it is closed, the robot has a 
0.8 chance that it will be open afterwards: 


p(X = is_open | U; = push, X, 1 —is open) = 1 
p(X; = is_closed | U; = push, X, 1 =is_open) = 0 
p(X; = is_open | U; = push, X, ; =is_closed) = 0.8 

p(X, = is_closed | U; = push, X, 1 =is_closed) = 0.2 


It can also choose not to use its manipulator, in which case the state of the 
world does not change. This is stated by the following conditional probabil- 
ities: 


p( Xt = is open | U; = do nothing, X, 1 = is open 


) 
p(X; = is closed | U, = do nothing, X, ; = is open) = 
p(X; = is open | U, = do nothing, X, ; = is closed) 


= Oo CO nm 


p(Xı = is closed | U; = do nothing, X, 1 = is closed) = 


Suppose at time t = 1, the robot takes no control action but it senses an open 
door. The resulting posterior belief is calculated by the Bayes filter using 
the prior belief bel( X9), the control u; = do nothing, and the measurement 
sense open as input. Since the state space is finite, the integral in line3 turns 
into a finite sum: 


bel(x1) 
= [oe | u1, zo) pel(zo) dzo 


= >》 p(z: | u1, zo) bel(zo) 


To 
= p(x, | Uı = do nothing, Xo = is open) bel(Xo = is open) 
+ p(zı | U; = do nothing, Xo = is closed) bel(Xo = is closed) 
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We can now substitute the two possible values for the state variable X1. For 
the hypothesis X; = is open, we obtain 
bel(X, = is open) 
= p(X;-is open | U; = do nothing, Xo = is open) 
bel( Xo — is open) 
+ p(X, = is open | U; = do nothing, Xo = is closed) 
bel(Xo = is closed) 
= 1.0.5 二 0.0.5 = 0.5 
Likewise, for X, = is closed we get 
bel(X, = is closed) 
= p(X, = is_closed | U; = do nothing, Xo = is open) 
bel(Xo = is open) 
+ p(X, = is closed | U, = do nothing, Xo = is closed) 
bel( Xo = is closed) 
= 0.0.5 十 1.0.5 = 0.5 
The fact that the belief bel(zi) equals our prior belief bel(zo) should not sur- 
prise, as the action do nothing does not affect the state of the world; neither 
does the world change over time by itself in our example. 
Incorporating the measurement, however, changes the belief. Line 4 of the 
Bayes filter algorithm implies 
bel(ri) = mp(Zi- sense open | zi) bel(x1) 
For the two possible cases, X; — is open and X, — is closed, we get 
bel( Xi = is open) 
= mp(Zi- sense open | X; = is open) bel(X,; = is open) 
= 5g0.6.0.5 = 90.3 
and 
bel( X, = is closed) 
n p(Z1 = sense open | X, = is closed) bel(X; = is closed) 
7 0.2.0.5 = 70.1 


The normalizer rj is now easily calculated: 


n = (03-01)! = 25 
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Hence, we have 
bel(Xi =is_open) = 0.75 
bel(X, = is closed) = 0.25 


This calculation is now easily iterated for the next time step. As the reader 
easily verifies, for uz = push and z2 = sense open we get 





bel(Xo =is_open) = 1-0.75+0.8-0.25 = 0.95 
bel(Xə =is_closed) = 0-0.754+0.2-0.25 = 0.05 
and 

bel(Xo =is_open) = 170.6-0.95 ~ 0.983 
bel(X2 = is closed) = 170.2-0.05 z 0.017 


At this point, the robot believes that with 0.983 probability the door is open. 

At first glance, this probability may appear to be sufficiently high to simply 
accept this hypothesis as the world state and act accordingly. However, such 
an approach may result in unnecessarily high costs. If mistaking a closed 
door for an open one incurs costs (e.g., the robot crashes into a door), con- 
sidering both hypotheses in the decision making process will be essential, as 
unlikely as one of them may be. Just imagine flying an aircraft on auto pilot 
with a perceived chance of 0.983 for not crashing! 


Mathematical Derivation of the Bayes Filter 


The correctness of the Bayes filter algorithm is shown by induction. To 
do so, we need to show that it correctly calculates the posterior distribu- 
tion p(x; | 214,14) from the corresponding posterior one time step earlier, 
p(za—ài | zi4—1, u14—1). The correctness follows then by induction under the 
assumption that we correctly initialized the prior belief bel(zo) at time t = 0. 

Our derivation requires that the state x; is complete, as defined in Chap- 
ter 2.3.1, and it requires that controls are chosen at random. The first step 
of our derivation involves the application of Bayes rule (2.16) to the target 
posterior: 


p(zi | Tt, Ž1:t—1; id) p(zi | Zit—1; Mt) 
plz | Z1:t—1; Wie) 


= n p(z | Lt, 2141, U1:) p(x | Z1:t—1, Ur) 





p(x | Z1 sts Ut) 


We now exploit the assumption that our state is complete. In Chapter 2.3.1, 
we defined a state zt to be complete if no variables prior to x; may influence 
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the stochastic evolution of future states. In particular, if we (hypothetically) 
knew the state x; and were interested in predicting the measurement z+, no 
past measurement or control would provide us additional information. In 
mathematical terms, this is expressed by the following conditional indepen- 
dence: 


plz | Ge, Z14—1, 01:4) = plz | x+) 


Such a statement is another example of conditional independence. It allows us 
to simplify (2.35) as follows: 


p(zi | Zit Ur:t) m n p(zi | Tt) p(x | giis) 
and hence 
bel(z,) = 1 p(z | xt) bel(x:) 


This equation is implemented in line 4 of the Bayes filter algorithm in Ta- 
ble 2.1. 
Next, we expand the term bel(«;), using (2.12): 


bel (xt) = p(t | 214-1, U1) 
= [ve | 241, Ž1:t—1; U1st) p(zaa | 21-1, U1:t) dxi—4 


Once again, we exploit the assumption that our state is complete. This im- 
plies if we know 2;_1, past measurements and controls convey no informa- 
tion regarding the state r;. This gives us 


p(zi | Lt—1; Z1:t-1, Ut) = plz | Lt-1, Ut) 


Here we retain the control variable u;, since it does not predate the state x;_,. 
In fact, the reader should quickly convince herself that p(x; | z; 1,u1) 天 
p(xt | zii). 

Finally, we note that the control u; can safely be omitted from the set of 
conditioning variables in p(zi.1 | 2141, 1:4) for randomly chosen controls. 
This gives us the recursive update equation 


bel(z,)) = ILES p(zi-1 | 1451, 914-3) dzt 1 





As the reader easily verifies, this equation is implemented by line 3 of the 
Bayes filter algorithm in Table 2.1. 

To summarize, the Bayes filter algorithm calculates the posterior over the 
state x; conditioned on the measurement and control data up to time t. The 
derivation assumes that the world is Markov, that is, the state is complete. 
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Any concrete implementation of this algorithm requires three probability 
distributions: The initial belief p(x), the measurement probability p(z; | zi), 
and the state transition probability p(x; | ur, Zit_1). We have not yet specified 
these densities for actual robot systems. But we will soon: Chapter 5 is en- 
tirely dedicated to p(x; | ut, z;.1) and Chapter 6 to p(z | x+). We also need 
a representation for the belief bel(z;), which will be discussed in Chapters 3 
and 4. 


The Markov Assumption 


A word is in order on the Markov assumption, or the complete state assumption, 
since it plays such a fundamental role in the material presented in this book. 
The Markov assumption postulates that past and future data are indepen- 
dent if one knows the current state x+. To see how severe an assumption this 
is, let us consider our example of mobile robot localization. In mobile robot 
localization, x+ is the robot's pose, and Bayes filters are applied to estimate 
the pose relative to a fixed map. The following factors may have a syste- 
matic effect on sensor readings. Thus, they induce violations of the Markov 
assumption: 


e Unmodeled dynamics in the environment not included in Zr (e.g., mov- 
ing people and their effects on sensor measurements in our localization 
example), 


e inaccuracies in the probabilistic models p(2 | zt) and p(zi | ut, 211) (e.g. 
an error in the map for a localizing robot), 


* approximation errors when using approximate representations of belief 
functions (e.g., grids or Gaussians, which will be discussed below), and 


* software variables in the robot control software that influence multiple 
controls (e.g., the variable "target location" typically influences an entire 
sequence of control commands). 


In principle, many of these variables can be included in state representations. 
However, incomplete state representations are often preferable to more com- 
plete ones to reduce the computational complexity of the Bayes filter algo- 
rithm. In practice, Bayes filters have been found to be surprisingly robust 
to such violations. As a general rule of thumb, however, one should exer- 
cise care when defining the state x+, so that the effect of unmodeled state 
variables has close-to-random effects. 
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Representation and Computation 


In probabilistic robotics, Bayes filters are implemented in several different 
ways. As we will see in the next two chapters, there exist quite a variety of 
techniques and algorithms that are all derived from the Bayes filter. Each 
such technique relies on different assumptions regarding the measurement 
and state transition probabilities and the initial belief. These assumptions 
then give rise to different types of posterior distributions, and the algorithms 
for computing them have different computational characteristics. As a gen- 
eral rule of thumb, exact techniques for calculating beliefs exist only for 
highly specialized cases; in general robotics problems, beliefs have to be ap- 
proximated. The nature of the approximation has important ramifications on 
the complexity of the algorithm. Finding a suitable approximation is usually 
a challenging problem, with no unique best answer for all robotics problems. 

When choosing an approximation, one has to trade off a range of proper- 
ties: 


1. Computational efficiency. Some approximations, such as linear Gaus- 
sian approximations that will be discussed further below, make it possi- 
ble to calculate beliefs in time polynomial in the dimension of the state 
space. Others may require exponential time. Particle-based techniques, 
discussed further below, have an any-time characteristic, enabling them to 
trade off accuracy with computational efficiency. 


2. Accuracy of the approximation. Some approximations can approximate a 
wider range of distributions more tightly than others. For example, linear 
Gaussian approximations are limited to unimodal distributions, whereas 
histogram representations can approximate multi-modal distributions, al- 
beit with limited accuracy. Particle representations can approximate a 
wide array of distributions, but the number of particles needed to attain a 
desired accuracy can be large. 


3. Ease of implementation. The difficulty of implementing probabilistic al- 
gorithms depends on a variety of factors, such as the form of the mea- 
surement probability p(z; | z;) and the state transition probability p(z; | 
ut, Zt_1) Particle representations often yield surprisingly simple imple- 
mentations for complex nonlinear systems—one of the reasons for their 
recent popularity. 


The next two chapters will introduce concrete implementable algorithms, 
which fare quite differently relative to the criteria described above. 
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Summary 


In this section, we introduced the basic idea of Bayes filters in robotics, as a 
means to estimate the state of an environment and the robot. 


* The interaction of a robot and its environment is modeled as a coupled 
dynamical system, in which the robot manipulates its environment by 
choosing controls, and in which it can perceive the environment through 
its sensors. 


* In probabilistic robotics, the dynamics of the robot and its environment 
are characterized in the form of two probabilistic laws: the state transition 
distribution, and the measurement distribution. The state transition dis- 
tribution characterizes how state changes over time, possibly as the effect 
of robot controls. The measurement distribution characterizes how mea- 
surements are governed by states. Both laws are probabilistic, accounting 
for the inherent uncertainty in state evolution and sensing. 


* The belief of a robot is the posterior distribution over the state of the en- 
vironment (including the robot state) given all past sensor measurements 
and all past controls. The Bayes filter is the principal algorithm for calcu- 
lating the belief in robotics. The Bayes filter is recursive; the belief at time 
t is calculated from the belief at time t — 1. 


* The Bayes filter makes a Markov assumption according to which the state 
is a complete summary of the past. This assumption implies the belief is 
sufficient to represent the past history of the robot. In robotics, the Markov 
assumption is usually only an approximation. We identified conditions 
under which it is violated. 


* Since the Bayes filter is not a practical algorithm, in that it cannot be im- 
plemented on a digital computer, probabilistic algorithms use tractable 
approximations. Such approximations may be evaluated according to dif- 
ferent criteria, relating to their accuracy, efficiency, and ease of implemen- 
tation. 


The next two chapters discuss two popular families of recursive state estima- 
tion techniques that are both derived from the Bayes filter. 
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Bibliographical Remarks 


The basic statistical material in this chapter is covered in most introductory textbooks to prob- 
ability and statistics. Some early classical texts by DeGroot (1975), Subrahmaniam (1979), and 
Thorp (1966) provide highly accessible introductions into this material. More advanced treat- 
ments can be found in (Feller 1968; Casella and Berger 1990; Tanner 1996), and in (Devroye et al. 
1996; Duda et al. 2000). The robot environment interaction paradigm is common in robotics. It 
is discussed from the AI perspective by Russell and Norvig (2002). 


Exercises 


1. A robot uses a range sensor that can measure ranges from 0m and 3m. For 
simplicity, assume that actual ranges are distributed uniformly in this in- 
terval. Unfortunately, the sensor can be faulty. When the sensor is faulty, 
it constantly outputs a range below 1m, regardless of the actual range in 
the sensor’s measurement cone. We know that the prior probability for a 
sensor to be faulty is p = 0.01. 


Suppose the robot queried its sensor N times, and every single time the 
measurement value is below 1m. What is the posterior probability of a 
sensor fault, for N = 1,2,...,10. Formulate the corresponding proba- 
bilistic model. 


2. Suppose we live at a place where days are either sunny, cloudy, or rainy. 
The weather transition function is a Markov chain with the following tran- 
sition table: 














tomorrow will be... 
sunny cloudy rainy 
sunny 8 2 0 
today it's... cloudy .4 .4 .2 
rainy 32. .0 2 








(a) Suppose Day 1 is a sunny day. What is the probability of the following 
sequence of days: Day2 = cloudy, Day3 = cloudy, Day4 = rainy? 

(b) Write a simulator that can randomly generate sequences of “weathers” 
from this state transition function. 

(c) Use your simulator to determine the stationary distribution of this 
Markov chain. The stationary distribution measures the probability 
that a random day will be sunny, cloudy, or rainy. 

(d) Can you devise a closed-form solution to calculating the stationary dis- 
tribution based on the state transition matrix above? 
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(e) What is the entropy of the stationary distribution? 


(f) Using Bayes rule, compute the probability table of yesterday’s weather 
given today’s weather. (It is okay to provide the probabilities numer- 
ically, and it is also okay to rely on results from previous questions in 
this exercise.) 


NH 


Suppose we added seasons to our model. The state transition function 
above would only apply to the Summer, whereas different ones would 
apply to Winter, Spring, and Fall. Would this violate the Markov prop- 
erty of this process? Explain your answer. 


(g 


3. Suppose that we cannot observe the weather directly, but instead rely on 
a sensor. The problem is that our sensor is noisy. Its measurements are 
governed by the following measurement model: 





our sensor tells us... 
sunny cloudy rainy 














sunny 6 A 0 
the actual weather is... cloudy 3 .7 0 
rainy 0 0 1 





(a) Suppose Day 1 is sunny (this is known for a fact), and in the subsequent 
four days our sensor observes cloudy, cloudy, rainy, sunny. What is the 
probability that Day 5 is indeed sunny as predicted by our sensor? 


(b) Once again, suppose Day 1 is known to be sunny. At Days 2 through 4, 
the sensor measures sunny, sunny, rainy. For each of the Days 2 through 
4, what is the most likely weather on that day? Answer the question 
in two ways: one in which only the data available to the day in ques- 
tion is used, and one in hindsight, where data from future days is also 
available. 


—- 
eo 
— 


Consider the same situation (Day 1 is sunny, the measurements for 
Days 2, 3, and 4 are sunny, sunny, rainy). What is the most likely se- 
quence of weather for Days 2 through 4? What is the probability of 
this most likely sequence? 


4. In this exercise we will apply Bayes rule to Gaussians. Suppose we are a 
mobile robot who lives on a long straight road. Our location x will simply 
be the position along this road. Now suppose that initially, we believe to 
be at location zinit = 1,000m, but we happen to know that this estimate 
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is uncertain. Based on this uncertainty, we model our initial belief by a 
Gaussian with variance o2,,, = 900m?. 


To find out more about our location, we query a GPS receiver. The GPS 
tells us our location is zaps — 1,100m. This GPS receiver is known to 
have an error variance of o2, = 100m?. 


(a) Write the probability density functions of the prior p(x) and the mea- 
surement p(z | x). 


(b) Using Bayes rule, what is the posterior p(x | z)? Can you prove it to be 
Gaussian? 


(c) How likely was the measurement zaps = 1, 100m given our prior, and 
knowledge of the error probability of our GPS receiver? 


Hint: This is an exercise in manipulating quadratic expressions. 


5. Derive Equations (2.18) and (2.19) from (2.17) and the laws of probability 
stated in the text. 


6. Prove Equation (2.25). What are the implications of this equality? 
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Introduction 


This chapter describes an important family of recursive state estimators, col- 
lectively called Gaussian filters. Historically, Gaussian filters constitute the 
earliest tractable implementations of the Bayes filter for continuous spaces. 
They are also by far the most popular family of techniques to date—despite 
a number of shortcomings. 

Gaussian techniques all share the basic idea that beliefs are represented by 
multivariate normal distributions. We already encountered a definition of 
the multivariate normal distribution in Equation (2.4), which is restated here 
for convenience: 


p(w) = det (2n5)~2 exp(-i(s— gu) X^! (s — p)} 


This density over the variable x is characterized by two sets of parameters: 
The mean pz and the covariance X. The mean p is a vector that possesses the 
same dimensionality as the state x. The covariance is a quadratic matrix that 
is symmetric and positive-semidefinite. Its dimension is the dimensionality 
of the state x squared. Thus, the number of elements in the covariance matrix 
depends quadratically on the number of elements in the state vector. 

The commitment to represent the posterior by a Gaussian has important 
ramifications. Most importantly, Gaussians are unimodal; they possess a sin- 
gle maximum. Such a posterior is characteristic of many tracking problems 
in robotics, in which the posterior is focused around the true state with a 
small margin of uncertainty. Gaussian posteriors are a poor match for many 
global estimation problems in which many distinct hypotheses exist, each of 
which forms its own mode in the posterior. 

The parameterization of a Gaussian by its mean and covariance is called 
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3.2.1 


GAUSSIAN POSTERIOR 


3 Gaussian Filters 


the moments parameterization. This is because the mean and covariance are the 
first and second moments of a probability distribution; all other moments are 
zero for normal distributions. In this chapter, we will also discuss an alter- 
native parameterization, called canonical parameterization, or sometimes natu- 
ral parameterization. Both parameterizations, the moments and the canonical 
parameterizations, are functionally equivalent in that a bijective mapping 
exists that transforms one into the other. However, they lead to filter algo- 
rithms with somewhat different computational characteristics. As we shall 
see, the canonical and the natural parameterizations are best thought of as 
duals: what appears to be computationally easy in one parameterization is 
involved in the other, and vice versa. 
This chapter introduces the two basic Gaussian filter algorithms. 


* Chapter 32 describes the Kalman filter, which implements the Bayes fil- 
ter using the moments parameterization for a restricted class of problems 
with linear dynamics and measurement functions. 


* The Kalman filter is extended to nonlinear problems in Chapter 3.3, which 
describes the extended Kalman filter. 


* Chapter 3.4 describes a different nonlinear Kalman filter, known as un- 
scented Kalman filter. 


* Chapter 3.5 describes the information filter, which is the dual of the 
Kalman filter using the canonical parameterization of Gaussians. 


The Kalman Filter 


Linear Gaussian Systems 


Probably the best studied technique for implementing Bayes filters is the 
Kalman filter, or (KF). The Kalman filter was invented by Swerling (1958) and 
Kalman (1960) as a technique for filtering and prediction in linear Gaussian 
systems, which will be defined in a moment. The Kalman filter implements 
belief computation for continuous states. It is not applicable to discrete or 
hybrid state spaces. 

The Kalman filter represents beliefs by the moments parameterization: At 
time f£, the belief is represented by the the mean ju, and the covariance X;. 
Posteriors are Gaussian if the following three properties hold, in addition to 
the Markov assumptions of the Bayes filter. 
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1. The state transition probability p(x; | ut, x+-1) must be a linear function 


in its arguments with added Gaussian noise. This is expressed by the 
following equation: 


Tt = Ama + Biu d 6 


Here x; and z,. , are state vectors, and u, is the control vector at time t. 
In our notation, both of these vectors are vertical vectors. They are of the 
form 


Tit U1,t 

X21 U2,t 
t= and w= 

Tnt Um,t 


A, and B, are matrices. A, is a square matrix of size n x n, where n is 
the dimension of the state vector x,. D, is of size n x m, with m being the 
dimension of the control vector u,. By multiplying the state and control 
vector with the matrices A; and Dj, respectively, the state transition func- 
tion becomes linear in its arguments. Thus, Kalman filters assume linear 
system dynamics. 


The random variable e, in (3.2) is a Gaussian random vector that mod- 
els the uncertainty introduced by the state transition. It is of the same 
dimension as the state vector. Its mean is zero, and its covariance will 
be denoted R;. A state transition probability of the form (3.2) is called a 
linear Gaussian, to reflect the fact that it is linear in its arguments with addi- 
tive Gaussian noise. Technically, one may also include a constant additive 
term in (3.2), which is here omitted since it plays no role in the material to 
come. 


Equation (3.2) defines the state transition probability p(x, | ut, x; 1). This 
probability is obtained by plugging Equation (3.2) into the definition of 
the multivariate normal distribution (3.1). The mean of the posterior state 
is given by Aiz;.1 + Brus and the covariance by R;: 


l 
p(ri|u,zi-1) = det (27R,) 2 
exp (732i — Auria — Bou)! Ry (24 — Aia — Bru) } 
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(3.6) 


(3.7) 
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1: Algorithm Kalman filter(u; 1, 3; 4, Ut, zt): 
2: Bi = At pui + Bru 

3: y» 一 A, 34 AT 十 Ri 

4: K: = X. er, X. OF + Qt 

5: pi = Bt + Kia — Ci ju) 

6: vet = (I = Ki Ci) Mt 

7: return p, Xt 











Table3.1 The Kalman filter algorithm for linear Gaussian state transitions and mea- 
surements. 


2. The measurement probability p(z; | x+) must also be linear in its argu- 
ments, with added Gaussian noise: 


zt = Cyr, 十 Ot 


Here C is a matrix of size k x n, where k is the dimension of the measure- 
ment vector z+. The vector 6; describes the measurement noise. The distri- 
bution of ô; is a multivariate Gaussian with zero mean and covariance Q}. 
The measurement probability is thus given by the following multivariate 
normal distribution: 


pla |x) = det (21Q,) 3 exp [- 1(z — Ct a)? Qc! (s — Cr z)) 


3. Finally, the initial belief bel(zo) must be normally distributed. We will 
denote the mean of this belief by /Ho and the covariance by Xo: 


bel(xo) =  p(xo) = det (253) 3 exp {—4 (xo — uo)" Xi! (zo — uo)! 


These three assumptions are sufficient to ensure that the posterior bel(z;) 
is always a Gaussian, for any point in time t. The proof of this non-trivial 
result can be found below, in the mathematical derivation of the Kalman 
filter (Chapter 3.2.4). 
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The Kalman Filter Algorithm 


The Kalman filter algorithm is depicted in Table 3.1. Kalman filters represent 
the belief bel(z,) at time t by the mean u; and the covariance X,;. The input 
of the Kalman filter is the belief at time t — 1, represented by 4-1 and X. 
To update these parameters, Kalman filters require the control u; and the 
measurement z;. The output is the belief at time t, represented by ju; and X;. 

In lines 2 and 3, the predicted belief j; and X is calculated representing 
the belief bel(z;) one time step later, but before incorporating the measure- 
ment z;. This belief is obtained by incorporating the control us. The mean 
is updated using the deterministic version of the state transition function 
(3.2), with the mean 1,1 substituted for the state Zt 1. The update of the co- 
variance considers the fact that states depend on previous states through the 
linear matrix A+. This matrix is multiplied twice into the covariance, since 
the covariance is a quadratic matrix. 

The belief bel(r,) is subsequently transformed into the desired belief 
bel(x+) in lines 4 through 6, by incorporating the measurement z+. The vari- 
able K;, computed in line 4 is called Kalman gain. It specifies the degree 
to which the measurement is incorporated into the new state estimate, in a 
way that will become clearer in Chapter 3.2.4. Line 5 manipulates the mean, 
by adjusting it in proportion to the Kalman gain K, and the deviation of 
the actual measurement, z;, and the measurement predicted according to the 
measurement probability (3.5). The key concept here is the innovation, which 
is the difference between the actual measurement z, and the expected mea- 
surement C, ji, in line 5. Finally, the new covariance of the posterior belief 
is calculated in line 6, adjusting for the information gain resulting from the 
measurement. 

The Kalman filter is computationally quite efficient. For today's best algo- 
rithms, the complexity of matrix inversion is approximately O(d?^) for a ma- 
trix of size d x d. Each iteration of the Kalman filter algorithm, as stated here, 
is lower bounded by (approximately) O(k? 4), where k is the dimension of the 
measurement vector z;. This (approximate) cubic complexity stems from the 
matrix inversion in line 4. Even for certain sparse updates discussed in fu- 
ture chapters, it is also at least in O(n?), where n is the dimension of the state 
space, due to the multiplication in line 6 (the matrix K,C, may be sparse). 
In many applications—such as the robot mapping applications discussed in 
later chapters—the measurement space is much lower dimensional than the 
state space, and the update is dominated by the O(n?) operations. 
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5 10 


Figure3.2 Illustration of Kalman filters: (a) initial belief, (b) a measurement (in bold) 
with the associated uncertainty, (c) belief after integrating the measurement into the 
belief using the Kalman filter algorithm, (d) belief after motion to the right (which 
introduces uncertainty), (e) a new measurement with associated uncertainty, and (f) 
the resulting belief. 


Illustration 


Figure 3.2 illustrates the Kalman filter algorithm for a simplistic one- 
dimensional localization scenario. Suppose the robot moves along the hori- 
zontal axis in each diagram in Figure 3.2. Let the prior over the robot location 
be given by the normal distribution shown in Figure 3.2a. The robot queries 
its sensors on its location (e.g., a GPS system), and those return a measure- 
ment that is centered at the peak of the bold Gaussian in Figure 3.2b. This 
bold Gaussian illustrates this measurement: Its peak is the value predicted 
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by the sensors, and its width (variance) corresponds to the uncertainty in 
the measurement. Combining the prior with the measurement, via lines 4 
through 6 of the Kalman filter algorithm in Table 3.1, yields the bold Gaus- 
sian in Figure 3.2c. This belief’s mean lies between the two original means, 
and its uncertainty radius is smaller than both contributing Gaussians. The 
fact that the residual uncertainty is smaller than the contributing Gaussians 
may appear counter-intuitive, but it is a general characteristic of information 
integration in Kalman filters. 

Next, assume the robot moves towards the right. Its uncertainty grows 
due to the fact that the state transition is stochastic. Lines 2 and 3 of the 
Kalman filter provide us with the Gaussian shown in bold in Figure 3.2d. 
This Gaussian is shifted by the amount the robot moved, and it is also wider 
for the reasons just explained. The robot receives a second measurement 
illustrated by the bold Gaussian in Figure 3.2e, which leads to the posterior 
shown in bold in Figure 3.2f. 

As this example illustrates, the Kalman filter alternates a measurement up- 
date step (lines 5-7), in which sensor data is integrated into the present belief, 
with a prediction step (or control update step), which modifies the belief in 
accordance to an action. The update step decreases and the prediction step 
increases uncertainty in the robot's belief. 


Mathematical Derivation of the KF 


This section derives the Kalman filter algorithm in Table 3.1. The section can 
safely be skipped at first reading; it is only included for completeness. 

Up front, the derivation of the KF is largely an exercise in manipulating 
quadratic expressions. When multiplying two Gaussians, for example, the 
exponents add. Since both original exponents are quadratic, so is the result- 
ing sum. The remaining exercise is then to come up with a factorization of the 
result into a form that makes it possible to read off the desired parameters. 


Part 1: Prediction 
Our derivation begins with lines 2 and 3 of the algorithm, in which the belief 


bel(x+) is calculated from the belief one time step earlier, bel(z, 1). Lines 2 
and 3 implement the update step described in Equation (2.41), restated here 
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(3.9) 


(3.10) 


(3.11) 


(3.12) 


(3.13) 
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for the reader's convenience: 


bel(z,) = J p(zi | xii, Ue) bel(xı—1) dzi—1 
—— ——— 
ON (x4;At£t-1+Btut, Ri) ~N (riciipici, Yd) 
The belief bel(x;_1) is represented by the mean ji... and the covariance X, 1. 
The state transition probability p(x; | x:-1, uz) was given in (3.4) as a normal 
distribution over x; with mean A;r;.; + Bius and covariance R;. As we 
shall show now, the outcome of (3.8) is again a Gaussian with mean jj; and 
covariance »; as stated in Table 3.1. 
We begin by writing (3.8) in its Gaussian form: 


bel(z;) 


= T] for (Cd (£t = At X1—] — Bi, us)? R;! (£t At Xi1—1 — DB, ur)} 





exp [75 (zii — Me 1) Dph (t41 — He 1)} dzca 
In short, we have 
bel(zi) = qm J evit da 
with 
Ly = i(m-A t1- Bi ut)? Ri! (xi — Aczia — By ut) 
十 i (ti-1— fea)? gl (ea — pia) 





Notice that L; is quadratic in z;..1; it is also quadratic in z+. 

Expression (3.10) contains an integral. Solving this integral requires us to 
reorder the terms in this interval, in a way that might appear counterintuitive 
at first. In particular, we will decompose Pr into two functions, Li(zi 1,2) 
and L(x): 


Ly = Le(xe-1, £) + Lia) 


This decomposition will simply be the result of reordering the terms in L;. 
A key goal of this decomposition step shall be that the variables in L; are 
partitioned into two sets, of which only one will depend on the variable z,. ,. 
The other, L;(z;), will not depend on z, 1. As a result, we will be able to 
move the latter variables out of the integral over the variable z,. i. 

This is illustrated by the following transformation: 


bel(zi) = m fvit da 
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n J Soto) — Li(z4)). dzt il 


n exp {—Lilx)} J Soto dai 


Of course, there exist many ways to decompose L, into two sets that would 
meet this criterion. The key insight is that we will choose L;(x,_1, £+) such 
that the value of the integral in (3.13) does not depend on z;. If we succeed 
in defining such a function L;(z;..1, £+), the entire integral over Li(zi 1,24) 
will simply become a constant relative to the problem of estimating the be- 
lief distribution over x+. Constants are usually captured in the normalization 
constant 7, so under our decomposition we will be able to subsume this con- 
stant into 7 (now for a different actual value of 7 as above): 


bel(z, = mexpí(-Li(zi)) 


Thus, our decomposition would make it possible to eliminate the integral 
from the belief (3.10). The result is just a normalized exponential over a 
quadratic function, which turns out to be a Gaussian. 

Let us now perform this decomposition. "We are seeking a function 
Li(£i—1, £4) quadratic in x;_,. (This function will also depend on z+, but 
that shall not concern us at this point.) To determine the coefficients of this 
quadratic, we calculate the first two derivatives of L;: 











OL _ 2 
dz » = -AT Tü ^ (x, — Ay 241 — Bi ut) + Uy (£t—1 = Pea) 
t— 
Or Lı E z _ 
adc U^ A R Ay 4x Se y) 


V, defines the curvature of L;(x;_1, £+). Setting the first derivative of L; to 0 
gives us the mean: 


AT R7' (xi — Aptis — Bau) = Ez} Gea) 





This expression is now solved for x;..1 





AT R7! (a, — Bi u) — AT Ry’ Aim 一 Mt d pe mee 
AT R;! At x14 + br) M1 = AT Re (a, — By ut) + x Ht 1 
(AT Rp? Ar Eph) tei = AP Rpt (zo By ue) + Y ua 
Wilma = APR (n—Bow)tYS ua 

zi; = Wy [AP Ry? (a, — By u) + Eph uuu] 


[1111 
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(3.19) 


(3.20) 


(3.21) 


(3.22) 


(3.23) 


(3.24) 
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Thus, we now have a quadratic function L;(x;..1, £+), defined as follows: 
5 (e1 — Ve [AT Ry" (we — Be uj) + Xt teal)? 7 
(zi — Vc [AP Re? (ae — Be ue) + Ey Ai) 


Lr m) = 


Clearly, this is not the only quadratic function satisfying our decomposition 
in (3.12). However, Li(zi.1, £+) is of the common quadratic form of the neg- 
ative exponent of a normal distribution. In fact the function 

1 
det(27W)~ 2 exp(—Li(z41,24)] 
is a valid probability density function (PDF) for the variable zt_ 1，As the 
reader easily verifies, this function is of the form defined in (3.1). We know 
from (2.5) that PDFs integrate to 1. Thus, we have 


1 
J sam? exp(—-L;(z,.1,2,)] dz- = 1 
From this it follows that 
1 
J Seton) dx, 1 = det(2rW)2 


The important thing to notice is that the value of this integral is independent 
of x,, our target variable. Thus, for our problem of calculating a distribution 
over z;, this integral is constant. Subsuming this constant into the normalizer 
n, we get the following expression for Equation (3.13): 


bel(r;)) = n exp{—L;(2;)} [exp {Lelia} d£t—1 
=  exp{—L;(zz)} 
This decomposition establishes the correctness of (3.14). Notice once again 
that the normalizers 7 are not the same in both lines. 
It remains to determine the function L;(r;), which is the difference of L;, 
defined in (3.11), and Li (x:_1, x+), defined in (3.19): 
Li (xt) = h- Li(zi-1, 24) 
= § (x;— A t1- Bi ut)’ Rz! (we — Ac zii — Bi ut) 
十 4 (zi-1 — fyi)? Eg Gia ua) 
—i(n.i- WV. [AT R7" (zx — Be ue) + Xl nep? v^! 
(zi-1 — V [AP R7" (zi — Bi ui) + EE, AI 





Let us quickly verify that L;(x;) indeed does not depend on z;..,. To do so, 
we substitute back V, = (AT R;! A, + X; )-!, and multiply out the terms 
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(3.28) 
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above. For the reader’s convenience, terms that contain r;..; are underlined 
(doubly if they are quadratic in z,. 1). 


Lt (xt) = i gh AF Ryt At X1—1 — zl AT RB! (xe = Bi, Ut) 








ap le (a, — B, us)? Re (ay — B, Ut) 
十 二 zi Dy tea — hy Xa tiu 1X una 


=s ira (AT Re As Da) Tt-1 




















tal, [AP Ry? (a: — By ut) + Ezt cu] 
—5 [A] Ry" (x, — Bru) + Dy meal” (AP Ry A+ Erh)! 
[AT Ry" (x — Be ue) + Eri mı] 





It is now easily seen that all terms that contain x;-; cancel out. This 
should come at no surprise, since it is a consequence of our construction 
of Li (£t—1, zt). 
Lila) = +4 (te — Biu)” Rpt (ae — Be ue) +5 iia Eg Hei 
-i [Af RF" (we — Be te) + Yl meal” (AP Rp? Ae + Eph)! 
[AT Ry? (£i — Be u) + Ezt Ac 
Furthermore, L,(r;) is quadratic in z+. This observation means that bel(z;) is 
indeed normal distributed. The mean and covariance of this distribution are 
of course the minimum and curvature of L(x), which we now easily obtain 
by computing the first and second derivatives of L;(x;) with respect to x+: 
OL,(x B = E ETE 
E 2 - Roo (@e— Bow) -RSOAQAPRS ARES) 
t 
[Ap Rr" (zi — By ui) + Ert ui] 
= [RS - RS Ap (AP Ry? As + Up)" AT Re") (we — By w) 
— Ry’ As (AP Re? Ar + Ez) Xx uA 





The inversion lemma stated (and shown) in Table 3.2 allows us to express the 
first factor as follows: 

Ry'— Rpt Ay (AP Rpt Ap + Eh) AT Rpt = (Rp + Ay Xii AD 
Hence the desired derivative is given by the following expression: 


OL, (x+) 


De, = (Ri + Ap Xi- ATyS (zi — Bi ui) 





—R,* At (AF Ry At E ML iD : EA Pt—1 
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Inversion Lemma. For any invertible quadratic matrices R and Q and any 
matrix P with appropriate dimensions, the following holds true 


(R--PQP'j! = gp'-R'P(Q'-P'R'Py'rmp 


assuming that all above matrices can be inverted as stated. 


Proof. Define V = (Q^! + PT R^! P)-!. It suffices to show that 
(R'i—-R'PWP'R-(R-PQP = I 
This is shown through a series of transformations: 
= R!R«mR'!PQP'-mR!PVP'R?!R 
Sam’ —— 
zr zu 
—R'PWUP’R'PQPT 
= [+RkR'PQP'-R'PUP’-R'PUPTR'PQPT 
= I+R'P(QP? —wP? — YP” R' PQP] 
= I+ R'PJ|QPT —- Y QQ PT — UP’ R! PQP] 
—— 
—I 
E 一 1 T -1 T 
= I+R'P(QPT - Ww QP’ 
—I 
= I-R|iPQP'-qrP-r 
M —MÁ— 


一 0 

















Table 3.2 The (specialized) inversion lemma, sometimes called the Sher- 
man/Morrison formula. 


The minimum of Zr(zi) is attained when the first derivative is zero. 


(Ri + Ap Xa AT)! (xı — Bi u) 
= BS A (AP Rpt Ar + Eh) EZH aua 





Solving this for the target variable x; gives us the surprisingly compact result 





z; = Biu + (Ri + Ap Yii Ap) Ry? Ae (AP RU As + Eh) X ua 

4t+4 Dt-1 ATR]! A Qu-iAT Rj! AI)! 

一 B, Ut T A, (I+ Besi ATR! At) ie Bee A, 十 Dt Ht—1 
一 一 


=U 





5rjs.cn 000000 


(3.32) 


(3.33) 


(3.34) 


(3.35) 


3.2 The Kalman Filter 51 


= B, ui At pua 


Thus, the mean of the belief bel (x+) after incorporating the motion command 
ut is By ui + At pui. This proves the correctness of line 2 of the Kalman filter 
algorithm in Table 3.1. 

Line 3 is now obtained by calculating the second derivative of L;(x;): 


O^ Lt (x+) 


ARD = QE AFHR) 


This is the curvature of the quadratic function L;(x;), whose inverse is the 
covariance of the belief bel(z;). 

To summarize, we showed that the prediction steps in lines 2 and 3 of the 
Kalman filter algorithm indeed implement the Bayes filter prediction step. 
To do so, we first decomposed the exponent of the belief bel(z;) into two 
functions, L;(z, 1,24) and L,(a;). Then we showed that L;(x+_1, x4) changes 
the predicted belief bel (x+) only by a constant factor, which can be subsumed 
into the normalizing constant 7. Finally, we determined the function L;(z;) 
and showed that it results in the mean ji; and covariance X, of the Kalman 
filter prediction bel (x+). 


Part 2: Measurement Update 


We will now derive the measurement update in lines 4, 5, and 6 (Table 3.1) of 
our Kalman filter algorithm. We begin with the general Bayes filter mecha- 
nism for incorporating measurements, stated in Equation (2.38) and restated 
here in annotated form: 


bel(z;)) = n p(z |z) bel(x+) 
—— —— 


ON (243Cext, Qt) ~N(resfie,Dt) 


The mean and covariance of bel(x;) are obviously given by fi and ©. The 
measurement probability p(2 | zz) was defined in (3.6) to be normal as well, 
with mean C; Zi and covariance Q+. Thus, the product is given by an expo- 
nential 


bel(zx;) = m exp{-Ji} 
with 
A = i(a-Owm) Q7" (zt — Cu) 1(z — fe)? XV (xe ju) 
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(3.42) 
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(3.44) 
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This function is quadratic in z+, hence bel(x;) is a Gaussian. To calculate 
its parameters, we once again calculate the first two derivatives of J; with 
respect to z;: 


OJ uc 

au m -C7 Qr! (zt — Cixi) + Sp! (we — fiz) 
t 

Od e = 

z CTOS Cts? 
t 


The second term is the inverse of the covariance of bel (x+): 
EX, = (QUOTES 


The mean of bel(x;) is the minimum of this quadratic function, which we 
now calculate by setting the first derivative of J; to zero (and substituting ji; 
for x+): 


Cl Qi (u-Cus) = X (ue — Pa) 
The expression on the left of the equal sign can be transformed as follows: 
C? Qi (zt — Cui) 

= OF Q7’ (zt — Ci m + Ct ii — Ct i) 

= OF Qr’ (a- Cele) - CE QT? Ce (pe — fie) 
Substituting this back into (3.39) gives us 
Cr Qr (Oui) = (Cf Qr Ce + Ep) Qs - hio) 

一 一 一 
三 了 1 

and hence we have 
Ue Ce Qr (2 — Ce fe) = m-i 


We now define the Kalman gain as 


Kı = MEE 
and obtain 
u = htt Ki (z — Ci fit) 


This proves the correctness of line 5 in the Kalman filter algorithm in Ta- 
ble 3.1. 
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The Kalman gain, as defined in (3.43), is a function of X. This is at odds 
with the fact that we utilize K; to calculate X; in line 6 of the algorithm. The 
following transformation shows us how to express A; in terms of covariances 
other than X. It begins with the definition of Ay in (3.43): 

K, Nn Or 
= D, OF Qi (Ci De CF + Qt) (Ce Se CE +. Qs) + 
SS 
=I 
= X(C QU Ci XC + CF QU Qi) (Ci $i CF + Qs)? 
—-——^ 
=I 
= X; (CF OF" C: Še OF +CT) (Ci È OF + QHT! 
= sq D SOLIS Se CP (CROT pO 
—— 





=I 
mE (01022505 XS SCP (Gee 07 40:7 
—— 
2x 
= XXX (GaXàcb-Q)' 
—— 


=I 
三 NOU B.C? QY 


This expression proves the correctness of line 4 of our Kalman filter algo- 
rithm. 

Line 6 is obtained by expressing the covariance using the Kalman gain K;. 
The advantage of the calculation in Table 3.1 over the definition in Equation 
(3.38) lies in the fact that we can avoid inverting the state covariance matrix. 
This is essential for applications of Kalman filters to high-dimensional state 
spaces. 

Our transformation is once again carried out using the inversion lemma, 
which was already stated in Table 3.2. Here we restate it using the notation 
of Equation (3.38): 


(Spl +O QT Gi) = 3-30 (Qi + Ci Xa 
This lets us arrive at the following expression for the covariance: 
X, = (CQ Or) 


= X -XCOD(Q + Ci $ CE) C, $ 

M- Xz CF (Qi + Ci CD) Ce] Ee 
~ = K, see Eq. (3.45) 

(I — K, Cy) Xs 
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This completes our correctness proof, in that it shows the correctness of line 
6 of our Kalman filter algorithm. 


The Extended Kalman Filter 


Why Linearize? 


The assumptions that observations are linear functions of the state and that 
the next state is a linear function of the previous state are crucial for the cor- 
rectness of the Kalman filter. The observation that any linear transformation 
of a Gaussian random variable results in another Gaussian random variable 
played an important role in the derivation of the Kalman filter algorithm. 
The efficiency of the Kalman filter is then due to the fact that the parameters 
of the resulting Gaussian can be computed in closed form. 

Throughout this and the following chapters, we will illustrate proper- 
ties of different density representations using the transformation of a one- 
dimensional Gaussian random variable. Figure 3.3a illustrates the linear 
transformation of such a random variable. The graph on the lower right 
shows the density of the random variable X ~ N(x; u, o°). Let us assume 
that X is passed through the linear function y = az + b, shown in the upper 
right graph. The resulting random variable, Y, is distributed according to a 
Gaussian with mean au +b and variance a?c?. This Gaussian is illustrated by 
the gray area in the upper left graph of Figure 3.3a. The reader may notice 
that this example is closely related to the next state update of the Kalman 
filter, with X = x, 1 and Y = zx, but without an additive noise variable; see 
also Equation (3.2). 

Unfortunately, state transitions and measurements are rarely linear in 
practice. For example, a robot that moves with constant translational and 
rotational velocity typically moves on a circular trajectory, which cannot be 
described by linear state transitions. This observation, along with the as- 
sumption of unimodal beliefs, renders plain Kalman filters, as discussed so 
far, inapplicable to all but the most trivial robotics problems. 

The extended Kalman filter, or EKF, relaxes one of these assumptions: the 
linearity assumption. Here the assumption is that the state transition proba- 
bility and the measurement probabilities are governed by nonlinear functions 
g and h, respectively: 


Zi = glut, 2-1) +E: 


Zt = h(x) + Ôt 
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Figure 3.3 (a) Linear and (b) nonlinear transformation of a Gaussian random vari- 
able. The lower right plots show the density of the original random variable, X. This 
random variable is passed through the function displayed in the upper right graphs 
(the transformation of the mean is indicated by the dotted line). The density of the 
resulting random variable Y is plotted in the upper left graphs. 
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This model strictly generalizes the linear Gaussian model underlying 
Kalman filters, as postulated in Equations (3.2) and (3.5). The function g 
replaces the matrices A; and B; in (3.2), and h replaces the matrix C; in 
(3.5). Unfortunately, with arbitrary functions g and h, the belief is no longer 
a Gaussian. In fact, performing the belief update exactly is usually impossi- 
ble for nonlinear functions g and h, and the Bayes filter does not possess a 
closed-form solution. 

Figure 33b illustrates the impact of a nonlinear transformation on a Gaus- 
sian random variable. The graphs on the lower right and upper right plot the 
random variable X and the nonlinear function g, respectively. The density 
of the transformed random variable, Y = g(X), is indicated by the gray area 
in the upper left graph of Figure 3.3b. Since this density cannot be computed 
in closed form, it was estimated by drawing 500,000 samples according to 
p(x), passing them through the function g, and then histogramming over the 
range of g. As can be seen, Y is not a Gaussian because the nonlinearities in 
9 distort the density of X in ways that destroy its Gaussian shape. 

The extended Kalman filter (EKF) calculates a Gaussian approximation 
to the true belief. The dashed curve in the upper left graph of Figure 3.3b 
shows the Gaussian approximation to the density of the random variable Y. 
Accordingly, EKFs represent the belief bel(z;) at time t by a mean u; and a 
covariance X,. Thus, the EKF inherits from the Kalman filter the basic belief 
representation, but it differs in that this belief is only approximate, not exact 
as was the case in Kalman filters. The goal of the EKF is thus shifted from 
computing the exact posterior to efficiently estimating its mean and covari- 
ance. However, since these statistics cannot be computed in closed form, the 
EKF has to resort to an additional approximation. 


Linearization Via Taylor Expansion 


The key idea underlying the EKF approximation is called linearization. Fig- 
ure 3.4 illustrates the basic concept. Linearization approximates the nonlin- 
ear function g by a linear function that is tangent to g at the mean of the 
Gaussian (dashed line in the upper right graph). Projecting the Gaussian 
through this linear approximation results in a Gaussian density, as indicated 
by the dashed line in the upper left graph. The solid line in the upper left 
graph represents the mean and covariance of the Monte-Carlo approxima- 
tion. The mismatch between these two Gaussians indicates the error caused 
by the linear approximation of g. 

The key advantage of the linearization, however, lies in its efficiency. 
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Figure 3.4 Illustration of linearization applied by the EKF. Instead of passing the 
Gaussian through the nonlinear function g, it is passed through a linear approxima- 
tion of g. The linear function is tangent to g at the mean of the original Gaussian. 
The resulting Gaussian is shown as the dashed line in the upper left graph. The lin- 
earization incurs an approximation error, as indicated by the mismatch between the 
linearized Gaussian (dashed) and the Gaussian computed from the highly accurate 
Monte-Carlo estimate (solid). 


The Monte-Carlo estimate of the Gaussian was achieved by passing 500,000 
points through g followed by the computation of their mean and covariance. 
The linearization applied by the EKF, on the other hand, only requires deter- 
mination of the linear approximation followed by the closed-form computa- 
tion of the resulting Gaussian. In fact, once g is linearized, the mechanics of 
the EKF's belief propagation are equivalent to those of the Kalman filter. 

This technique also is applied to the multiplication of Gaussians when a 
measurement function h is involved. Again, the EKF approximates h by a 
linear function tangent to h, thereby retaining the Gaussian nature of the 
posterior belief. 

There exist many techniques for linearizing nonlinear functions. EKFs 
utilize a method called (first order) Taylor expansion. Taylor expansion con- 
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(3.50) 


(3.51) 


(3.52) 
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(3.53) 


(3.54) 
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structs a linear approximation to a function g from g's value and slope. The 
slope is given by the partial derivative 


Og(ut, t-1) 


(ut, U1) :一 
g (ui ti) Dd 


Clearly, both the value of g and its slope depend on the argument of g. A 
logical choice for selecting the argument is to choose the state deemed most 
likely at the time of linearization. For Gaussians, the most likely state is 
the mean of the posterior jj, 4. In other words, g is approximated by its 
value at 4:1 (and at u), and the linear extrapolation is achieved by a term 
proportional to the gradient of g at 4-1 and uy: 


glut, 4-1) x g (ue, i1) T g (Ut, pi-1) (Zi 1 一 Hi_1) 
—— 

=: Gt 

= glut, pmi) + Ge (ici — pia) 





Written as Gaussian, the state transition probability is approximated as fol- 
lows: 


p(zi | Ut, Lt—1) 
1 
~ det(2rR:) 2 exp (^3 [ae — glut, ps1) — Gi (zi 一 AD 


Ri [we — glut, i1) — Gi (rica — )]] 


Notice that G; is a matrix of size n x n, with n denoting the dimension of 
the state. This matrix is often called the Jacobian. The value of the Jacobian 
depends on u; and u+—1, hence it differs for different points in time. 

EKFs implement the exact same linearization for the measurement func- 
tion h. Here the Taylor expansion is developed around j, the state deemed 
most likely by the robot at the time it linearizes h: 


h(z,) ~ h(i) + hn) (zi — ju) 
=: H; 
= h(i) + Hi (x; — fit) 





with A'(z,) = er. Written as a Gaussian, we have 





plil) = det(2zQU ? exp {4 [ze — hfe) — He (we — mI? 
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1: Algorithm Extended Kalman filter(j, 1, X4 1, ut, 21): 
2: ht = g(Ut, Ht—1) 
3: »»n 一 Gi Xi GT + Ri 
4: Ky = Mt TI UH. 2 HE + Qi)! 
5: Hi = Be + Ki (ze — h(fie)) 
6: D = (I - Ki Hi) Dt 
7: return pu, Xt 
Table3.3 The extended Kalman filter algorithm. 
The EKF Algorithm 


Table 3.3 states the EKF algorithm. In many ways, this algorithm is similar to 
the Kalman filter algorithm stated in Table 3.1. The most important differ- 
ences are summarized by the following table: 





| Kalman filter | EKF 
state prediction (line 2) At Ht—1 十 DB, Ut glut, t1) 
measurement prediction (line 5) C, fit h(fit) 


That is, the linear predictions in Kalman filters are replaced by their nonlin- 
ear generalizations in EKFs. Moreover, EKFs use Jacobians G; and H; instead 
of the corresponding linear system matrices A+, B,, and C; in Kalman filters. 
The Jacobian G; corresponds to the matrices A; and B;, and the Jacobian H; 
corresponds to C;. A detailed example for extended Kalman filters will be 
given in Chapter 7. 


Mathematical Derivation of the EKF 


The mathematical derivation of the EKF parallels that of the Kalman filter 
in Chapter 3.2.4, and hence shall only be sketched here. The prediction is 
calculated as follows (c.f. (3.8)): 
bel(z,) = / plar | zi, Ut) bel(z 1) daziaà 

jm ee — y—/ 


ON (2439 (Ut pa 1) 9- Gi(mi c3 — pai) Rt) ~N (icai, Yi) 





This distribution is the EKF analog of the prediction distribution in the 
Kalman filter, stated in (3.8). The Gaussian p(x, | x4~1, uz) can be found 
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in Equation (3.52). The function L; is given by (c.f. (3.11)) 
Ly = $ (te —9(us, me-1) — Gai m) 

Ri (te — g(us, i-i) — Ge(we-1 — 1-1) 

十 i (tei ea)? gl (eee — na) 





which is quadratic in both z;..; and z+, as above. As in (3.12), we decompose 
It into It (24-1, Xt) and It (xz): 
Li(zia, Lt) 
= $(m1-® [GP Rpt (ae — g(us, jii) Gau) + pei)” 9! 
(zi-1 — © [GF Rz! (we — glue, pii) + Guu-i) + Epitel) 


with 

® = (GPR G+ Erh) 

and hence 

I(t) = Te — g(us, pai) Giu)! Ret (awe — glut, pii) 二 Gu 





1 
3 ( 
+4 (zia — mca)? Nyy (yea — ua) 
-$ IGE Re? (zi — g(uc pii) + Gee) + Yita] 
$, [GT Rz! (zi — g(ue, pii) + Gau) + Xs ua] 
As the reader can easily verify, setting the first derivative of L;(x;) to 
zero gives us the update ju; = g(ui, HU .1), in analogy to the derivation in 
Equations (3.27) through (3.31). The second derivative is given by (Ri + 
Gi X41 GT)! (see (3.32)). 

The measurement update is also derived analogously to the Kalman filter 
in Chapter 3.2.4. In analogy to (3.33), we have for the EKF 
bel(zi) = m p(2 | zx) bel(;) 

—— \ 一 一 一 
AN (zesh(fe)+He (£e— At), Qt) ~N (xii) 
using the linearized state transition function from (3.53). This leads to the 
exponent (see (3.35)): 
4 = i(m-h(Bm)-H (2 fit)” Qr! (2 — h(i) — He (£4 — Bit)) 
t$ — fis)" XL (we — pa) 

The resulting mean and covariance is given by 
Me = pu Kila — h(i) 
Soo (KB) 
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with the Kalman gain 
K, = X Hy (Ay Xa Hi +Q)! 


The derivation of these equations is analogous to Equations (3.36) through 
(3.47). 


Practical Considerations 


The EKF has become just about the most popular tool for state estimation in 
robotics. Its strength lies in its simplicity and in its computational efficiency. 
As was the case for the Kalman filter, each update requires time O(k?^ + 
n?), where k is the dimension of the measurement vector z;, and n is the 
dimension of the state vector x+. Other algorithms, such as the particle filter 
discussed further below, may require time exponential in n. 

The EKF owes its computational efficiency to the fact that it represents the 
belief by a multivariate Gaussian distribution. A Gaussian is a unimodal 
distribution, which can be thought of as a single guess annotated with an 
uncertainty ellipse. In many practical problems, Gaussians are robust estima- 
tors. Applications of the Kalman filter to state spaces with 1,000 dimensions 
or more will be discussed in later chapters of this book. EKFs have been ap- 
plied with great success to a number of state estimation problems that violate 
the underlying assumptions. 

An important limitation of the EKF arises from the fact that it approxi- 
mates state transitions and measurements using linear Taylor expansions. In 
most robotics problems, state transitions and measurements are nonlinear. 
The goodness of the linear approximation applied by the EKF depends on 
two main factors: The degree of uncertainty and the degree of local non- 
linearity of the functions that are being approximated. The two graphs in 
Figure 3.5 illustrate the dependency on the uncertainty. Here, two Gaussian 
random variables are passed through the same nonlinear function (c.f. also 
Figure 3.4). While both Gaussians have the same mean, the variable shown 
in (a) has a higher uncertainty than the one in (b). Since the Taylor expansion 
only depends on the mean, both Gaussians are passed through the same lin- 
ear approximation. The gray areas in the upper left plots of the two figures 
show the densities of the resulting random variable, computed by Monte- 
Carlo estimation. The density resulting from the wider Gaussian is far more 
distorted than the density resulting from the narrow, less uncertain Gaus- 
sian. The Gaussian approximations of these densities are given by the solid 
lines in the figures. The dashed graphs show the Gaussians estimated by the 
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Figure 3.5 Dependency of approximation quality on uncertainty. Both Gaussians 
(lower right) have the same mean and are passed through the same nonlinear func- 
tion (upper right). The higher uncertainty of the left Gaussian produces a more dis- 
torted density of the resulting random variable (gray area in upper left graph). The 
solid lines in the upper left graphs show the Gaussians extracted from these densities. 


The dashed lines represent the Gaussians generated by the linearization applied by 
the EKF. 
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linearization. A comparison to the Gaussians resulting from the Monte-Carlo 
approximations illustrates the fact that higher uncertainty typically results in 
less accurate estimates of the mean and covariance of the resulting random 
variable. 

The second factor for the quality of the linear Gaussian approximation is 
the local nonlinearity of the function g, as illustrated in Figure 3.6. Shown 
there are two Gaussians with the same variance passed through the same 
nonlinear function. In Panel (a), the mean of the Gaussian falls into a more 
nonlinear region of the function g than in Panel (b). The mismatch between 
the accurate Monte-Carlo estimate of the Gaussian (solid line, upper left) 
and the Gaussian resulting from linear approximation (dashed line) shows 
that higher nonlinearities result in larger approximation errors. The EKF 
Gaussian clearly underestimates the spread of the resulting density. 

Sometimes, one might want to pursue multiple distinct hypotheses. For 
example, a robot might have two distinct hypotheses as to where it is, but the 
arithmetic mean of these hypotheses is not a likely contender. Such situations 
require multi-modal representations for the posterior belief. EKFs, in the 
form described here, are incapable of representing such multimodal beliefs. 
A common extension of EKFs is to represent posteriors using mixtures, or 
sums, of Gaussians. A mixture of Gaussians may be of the form 


u 1 
ESTY 


NI 


bel(xz) 5 Pi det (27341) exp f- iG = Mea)” Dep (we — in)) 
l 


Here y4, are mixture parameters with v,; > 0. These parameters serve 
as weights of the mixture components. They are estimated from the like- 
lihoods of the observations conditioned on the corresponding Gaussians. 
EKFs that utilize such mixture representations are called multi-hypothesis (ex- 
tended) Kalman filters, or MHEKF. 

To summarize, if the nonlinear functions are approximately linear at the 
mean of the estimate, then the EKF approximation may generally be a good 
one, and EKFs may approximate the posterior belief with sufficient accuracy. 
Furthermore, the less certain the robot, the wider its Gaussian belief, and the 
more it is affected by nonlinearities in the state transition and measurement 
functions. In practice, when applying EKFs it is therefore important to keep 
the uncertainty of the state estimate small. 
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Figure 3.6 Dependence of the approximation quality on local nonlinearity of the 
function g. Both Gaussians (lower right in each of the two panels) have the same co- 
variance and are passed through the same function (upper right). The linear approx- 
imation applied by the EKF is shown as the dashed lines in the upper right graphs. 
The solid lines in the upper left graphs show the Gaussians extracted from the highly 
accurate Monte-Carlo estimates. The dashed lines represent the Gaussians generated 
by the EKF linearization. 
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The Unscented Kalman Filter 


The Taylor series expansion applied by the EKF is only one way to linearize 
the transformation of a Gaussian. Two other approaches have often been 
found to yield superior results. One is known as moments matching (and the 
resulting filter is known as assumed density filter, or ADF), in which the lin- 
earization is calculated in a way that preserves the true mean and the true 
covariance of the posterior distribution (which is not the case for EKFs). An- 
other linearization method is applied by the unscented Kalman filter, or UKF, 
which performs a stochastic linearization through the use of a weighted sta- 
tistical linear regression process. We now discuss the UKF algorithm without 
mathematical derivation. The reader is encouraged to read more details in 
the literature referenced in the bibliographical remarks. 


Linearization Via the Unscented Transform 


Figure 3.7 illustrates the linearization applied by the UKF, called the un- 
scented transform. Instead of approximating the function g by a Taylor series 
expansion, the UKF deterministically extracts so-called sigma points from the 
Gaussian and passes these through g. In the general case, these sigma points 
are located at the mean and symmetrically along the main axes of the covari- 
ance (two per dimension). For an n-dimensional Gaussian with mean p and 
covariance ©, the resulting 2n + 1 sigma points A! are chosen according to 
the following rule: 





xo = u 
xi = y+ ( (n- XE). fori =1,...,n 
xi = u-( (*XE). fori —n-1,...,2n 


Here \ = o? (n-- &) —n, with o and « being scaling parameters that determine 
how far the sigma points are spread from the mean. Each sigma point X [i] 
has two weights associated with it. One weight, wË], is used when comput- 
ing the mean, the other weight, wl, is used when recovering the covariance 
of the Gaussian. 








oi 4, ce 
m n+ 

whl = a +(1-—a? +8) 
B nA 
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Figure 3.7 Illustration of linearization applied by the UKF. The filter first extracts 
2n 十 1 weighted sigma points from the n-dimensional Gaussian (n — 1 in this exam- 
ple). These sigma points are passed through the nonlinear function g. The linearized 
Gaussian is then extracted from the mapped sigma points (small circles in the upper 
right plot). As for the EKF, the linearization incurs an approximation error, indi- 
cated by the mismatch between the linearized Gaussian (dashed) and the Gaussian 
computed from the highly accurate Monte-Carlo estimate (solid). 


1 
(n+) 


wi = wä = fori = 1, 2m 


The parameter 6 can be chosen to encode additional (higher order) know- 
ledge about the distribution underlying the Gaussian representation. If the 
distribution is an exact Gaussian, then 8 = 2 is the optimal choice. 

The sigma points are then passed through the function g, thereby probing 
how g changes the shape of the Gaussian. 


yi = gam 


The parameters (j:’ %’) of the resulting Gaussian are extracted from the 
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mapped sigma points Yİ according to 

2n 

i-0 

2n 
X = Soul VA B - ur. 

i-0 


Figure 3.8 illustrates the dependency of the unscented transform on the 
uncertainty of the original Gaussian. For comparison, the results using the 
EKF Taylor series expansion are plotted alongside the UKF results. 

Figure 3.9 shows an additional comparison between UKF and EKF approx- 
imation, here in dependency of the local nonlinearity of the function g. As 
can be seen, the unscented transform is more accurate than the first order 
Taylor series expansion applied by the EKF. In fact, it can be shown that the 
unscented transform is accurate in the first two terms of the Taylor expan- 
sion, while the EKF captures only the first order term. (It should be noted, 
however, that both the EKF and the UKF can be modified to capture higher 
order terms.) 


The UKF Algorithm 


The UKF algorithm utilizing the unscented transform is presented in Ta- 
ble 3.4. The input and output are identical to the EKF algorithm. Line 2 
determines the sigma points of the previous belief using Equation (3.66), 
with ^ short for Vm + 4. These points are propagated through the noise-free 
state prediction in line 3. The predicted mean and variance are then com- 
puted from the resulting sigma points (lines 4 and 5). R; in line 5 is added to 
the sigma point covariance in order to model the additional prediction noise 
uncertainty (compare line 3 of the EKF algorithm in Table 3.3). The predic- 
tion noise R, is assumed to be additive. Later, in Chapter 7, we present a 
version of the UKF algorithm that performs more accurate estimation of the 
prediction and measurement noise terms. 

A new set of sigma points is extracted from the predicted Gaussian in line 
6. This sigma point set X, now captures the overall uncertainty after the pre- 
diction step. In line 7, a predicted observation is computed for each sigma 
point. The resulting observation sigma points Z, are used to compute the 
predicted observation 2 and its uncertainty, S+. The matrix Q; is the co- 
variance matrix of the additive measurement noise. Note that S; represents 
the same uncertainty as H; X, HT + Q: in line 4 of the EKF algorithm in 
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Figure 3.8 Linearization results for the UKF depending on the uncertainty of the 
original Gaussian. The results of the EKF linearization are also shown for comparison 
(c.f. Figure3.5). The unscented transform incurs smaller approximation errors, as can 
be seen by the stronger similarity between the dashed and the solid Gaussians. 


Table 3.3. Line 10 determines the cross-covariance between state and obser- 
vation, which is then used in line 11 to compute the Kalman gain K;. The 
cross-covariance X^ corresponds to the term X, Hf in line 4 of the EKF 
algorithm. With this in mind it is straightforward to show that the estima- 
tion update performed in lines 12 and 13 is of equivalent form to the update 
performed by the EKF algorithm. 
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Figure 3.9 Linearization results for the UKF depending on the mean of the origi- 
nal Gaussian. The results of the EKF linearization are also shown for comparison 
(c.f. Figure3.6). The sigma point linearization incurs smaller approximation errors, as 


can be seen by the stronger similarity between the dashed and the solid Gaussians. 


The asymptotic complexity of the UKF algorithm is the same as for the 
EKF. In practice, the EKF is often slightly faster than the UKF. The UKF is still 
highly efficient, even with this slowdown by a constant factor. Furthermore, 
the UKF inherits the benefits of the unscented transform for linearization. 
For purely linear systems, it can be shown that the estimates generated by 
the UKF are identical to those generated by the Kalman filter. For nonlin- 
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1: Algorithm Unscented Kalman filter(j. 1, 34.1, Ut, zt): 
Am uoi Hei HV St o guai Yy Mea) 
AF = g(ui, Xi) 
2n 
4: n — 5 wä x [i] 
i=0 
5: Mt — X wil (X, [i] = ia), [i] = pit 4 R, 
i=0 
X= (ie fe + WV Xs Ba — Y 53) 
Z, = h(A) 
2n PES 
8 Zt = 5 wË T] 
i=0 
2n l ; ) 
9 Se= X uf (zB £ yz = &)7 +Q 
i=0 
2n 7 
10 ye? 一 So will ea i) gg = &)* 
i=0 
11 Ki = SP sy 
12: Le = Pi + Kin — 2) 
13: D; =; — K; S; KT 
14: return p, Ut 











Table 3.4 The unscented Kalman filter algorithm. The variable n denotes the di- 
mensionality of the state vector. 


ear systems the UKF produces equal or better results than the EKF, where 
the improvement over the EKF depends on the nonlinearities and spread 
of the prior state uncertainty. In many practical applications, the difference 
between EKF and UKF is negligible. 

Another advantage of the UKF is the fact that it does not require the com- 
putation of Jacobians, which are difficult to determine in some domains. The 
UKF is thus often referred to as a derivative-free filter. 
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Finally, the unscented transform has some resemblance to the sample- 
based representation used by particle filters, which will be discussed in the 
next chapter. A key difference however, is that the sigma points of the 
unscented transform are determined deterministically, while particle filters 
draw samples randomly. This has important implications. If the underlying 
distribution is approximately Gaussian, then the UKF representation is far 
more efficient than the particle filter representation. If, on the other hand, the 
belief is highly non-Gaussian, then the UKF representation is too restrictive 
and the filter can perform arbitrarily poorly. 


The Information Filter 


The dual of the Kalman filter is the information filter, or IF. Just like the KF 
and its nonlinear versions, the EKF and the UKF, the information filter repre- 
sents the belief by a Gaussian. Thus, the standard information filter is subject 
to the same assumptions underlying the Kalman filter. The key difference 
between the KF and the IF arises from the way the Gaussian belief is rep- 
resented. Whereas in the Kalman filter family of algorithms, Gaussians are 
represented by their moments (mean, covariance), information filters repre- 
sent Gaussians in their canonical parameterization, which is comprised of an 
information matrix and an information vector. The difference in parameter- 
ization leads to different update equations. In particular, what is computa- 
tionally complex in one parameterization happens to be simple in the other 
(and vice versa). The canonical and the moments parameterizations are often 
considered dual to each other, and thus are the IF and the KF. 


Canonical Parameterization 


The canonical parameterization of a multivariate Gaussian is given by a matrix 
Q and a vector £. The matrix Q is the inverse of the covariance matrix: 


Q = x 


Q is called the information matrix, or sometimes the precision matrix. The vec- 
tor £ is called the information vector. It is defined as 


E= X'y 


It is easy to see that 2. and € are a complete parameterization of a Gaussian. 
In particular, the mean and covariance of the Gaussian can easily be obtained 
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from the canonical parameterization by the inverse of (3.70) and (3.71): 
ELA Q 

p = Q'E 

The canonical parameterization is often derived by multiplying out the expo- 


nent of a Gaussian. In (3.1), we defined the multivariate normal distribution 
as follows: 


1 
p(x) = det (2nB)~? exp(-i(z — 1) E (a — p)} 

A straightforward sequence of transformations leads to the following param- 
eterization: 


1 
plx) = det (27h) 2 exp {—ga7 Sle +a? ool su S| ph 


1 
det (27) 2 exp (-iuTX yu) exp {-taT Dole 十 ZL 
Se 
const. 


The term labeled “const.” does not depend on the target variable x. Hence, 
it can be subsumed into the normalizer 7. 


p(r) = 7 exp {—hat Sigg So u} 
This form motivates the parameterization of a Gaussian by its canonical pa- 
rameters 2 and £. 


p(z) = mexp{-3znz+z E} 
In many ways, the canonical parameterization is more elegant than the mo- 


ments parameterization. In particular, the negative logarithm of the Gaus- 
sian is a quadratic function in x, with the canonical parameters ( and £: 
—logp(x) = const. + ix Qz—zl£ 
Here "const." is a constant. The reader may notice that we cannot use the 
symbol 7 to denote this constant, since negative logarithms of probabilities 
do not normalize to 1. The negative logarithm of our distribution p(x) is 
quadratic in x, with the quadratic term parameterized by Q and the linear 
term by €. In fact, for Gaussians, Q must be positive semidefinite, hence 
— log p(x) is a quadratic distance function with mean pp = Q7! €. This is 
easily verified by setting the first derivative of (3.78) to zero: 
o|- log p(z)] 
Ox 
The matrix 2 determines the rate at which the distance function increases 
in the different dimensions of the variable x. A quadratic distance that is 
weighted by a matrix Q is called a Mahalanobis distance. 


0 = Qr- = 0 —» e='é 
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1: Algorithm Information_filter(€;—1, Q; 4, Ut, zt): 


n» 


Qy = (A. OF, AT + Ru)! 
Et = QUA NC € 1 + B, ut) 


o 


4 ù = CT Qj' Ci (s 
5: &-ClQiuc&ü 
6: return £,, u 











Table 3.5 The information filter algorithm. 


The Information Filter Algorithm 


Table 3.5 states the update algorithm known as information filter. Its input is 
a Gaussian in its canonical parameterization €_; and Q;—1, representing the 
belief at time t — 1. Just like all Bayes filters, its input includes the control 
u, and the measurement z,. The output are the parameters £, and Q, of the 
updated Gaussian. 

The update involves matrices A;, B,, C;, Ry, and Q+. Those were defined 
in Chapter 3.2. The information filter assumes that the state transition and 
measurement probabilities are governed by the following linear Gaussian 
equations, originally defined in (3.2) and (3.5): 


Tt = Ama + Biu tE 
zt = Cyr, 十 Ot 


Here R; and Q; are the covariances of the zero-mean noise variables e; and 
dz, respectively. 

Just like the Kalman filter, the information filter is updated in two steps, a 
prediction step and a measurement update step. The prediction step is im- 
plemented in lines 2 and 3 in Table 3.5. The parameters & and C), describe 
the Gaussian belief over x; after incorporating the control u;, but before in- 
corporating the measurement z+. The latter is done through lines 4 and 5. 
Here the belief is updated based on the measurement z+. 

These two update steps can be vastly different in complexity, especially if 
the state space possesses many dimensions. The prediction step, as stated 
in Table 3.5, involves the inversion of two matrices of the size n x n, where 
n is the dimension of the state space. This inversion requires approximately 
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O(n?^) time. In Kalman filters, the update step is additive and requires at 
most O(n?) time; it requires less time if only a subset of variables is affected 
by a control, or if variables transition independently of each other. These 
roles are reversed for the measurement update step. Measurement updates 
are additive in the information filter. They require at most O(n?) time, and 
they are even more efficient if measurements carry only information about 
a subset of all state variables at a time. The measurement update is the dif- 
ficult step in Kalman filters. It requires matrix inversion whose worst case 
complexity is O(n?-4). This illustrates the dual character of Kalman and in- 
formation filters. 


Mathematical Derivation of the Information Filter 


The derivation of the information filter is analogous to that of the Kalman 
filter. 

To derive the prediction step (lines 2 and 3 in Table 3.5), we begin with the 
corresponding update equations of the Kalman filters, which can be found 
in lines 2 and 3 of the algorithm in Table 3.1 and are restated here for the 
reader's convenience: 


fie = Appui Bou 
5 一 A, Xii AT 十 FR 


The information filter prediction step follows now directly by substituting 
the moments u and X by the canonical parameters € and €? according to their 
definitions in (3.72) and (3.73): 


Me-1 = 97 &t-1 

Xa = Qc 1 

Substituting these expressions in (3.82) and (3.83) gives us the set of predic- 
tion equations 

Q = (AO ADR)" 

& = (QA OL &-1 Bru) 

These equations are identical to those in Table 3.5. As is easily seen, the 
prediction step involves two nested inversions of a potentially large matrix. 
These nested inversions can be avoided when only a small number of state 


variables is affected by the motion update, a topic that will be discussed later 
in this book. 
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The derivation of the measurement update is even simpler. We begin with 
the Gaussian of the belief at time t, which was provided in Equation (3.35) 
and is restated here once again: 
bel(x+) 

= n exp{—} (a — Crs)” Qi. (a — Cin) — $ (2 — fie)” X1 (we — i)] 
For Gaussians represented in their canonical form this distribution is given 
by 
bel(x+) 

= nexp{—j af CQ Qj Oreta, Cf Qi a 3 af Qer + of &} 





which, by reordering the terms in the exponent, resolves to 
bel(zj) = m exp {-4 rT [CF QI) Ci + 0u) re + a? [CF Qr) za + &J} 


We can now read off the measurement update equations, by collecting the 
terms in the squared brackets: 


& = Cr Qat 
Q; == CT On 6 cO 


These equations are identical to the measurement update equations in lines 
4 and 5 of Table 3.5. 


The Extended Information Filter Algorithm 


The extended information filter, or EIF, extends the information filter to the 
nonlinear case, very much in the same way the EKF is the non-linear exten- 
sion of the Kalman filter. Table 3.6 depicts the EIF algorithm. The predic- 
tion is realized in lines 2 through 4, and the measurement update in lines 5 
through 7. These update equations are largely analog to the linear informa- 
tion filter, with the functions g and h (and their Jacobian G; and H;) replacing 
the parameters of the linear model A;, B,, and C;. As before, g and h specify 
the nonlinear state transition function and measurement function, respec- 
tively. Those were defined in (3.48) and (3.49) and are restated here: 


Zi = glut, 24-1) +E: 


Zt = h(a) + ôi 


Unfortunately, both g and h require a state as an input. This mandates the 
recovery of a state estimate u from the canonical parameters. The recovery 
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1: Algorithm Extended information filter(£, 1, Q; 4, we, z+): 
2: At-1 = Qi Ei 

3: 0, 一 (Gi Q4 GT + R)! 

4: & = Gi glut, i-i) 

5: ba = g(ui, pii) 

6: ù% = 0, - HF Q7) H 

7: & = & + HT Qr [ze — h(ps) + H Bl 

8: return £;, 0 











Table 3.6 The extended information filter (EIF) algorithm. 


takes place in line 2, in which the state j1,_1 is calculated from Q,_1 and 1 
in the obvious way. Line 5 computes the state jj, using the equation familiar 
from the EKF (line 2 in Table 3.3). The necessity to recover the state esti- 
mate seems at odds with the desire to represent the filter using its canonical 
parameters. We will revisit this topic when discussing the use of extended 
information filters in the context of robotic mapping. 


Mathematical Derivation of the Extended Information Filter 


The extended information filter is easily derived by essentially performing 
the same linearization that led to the extended Kalman filter above. As in 
(3.51) and (3.53), the extended information filter approximates g and h by a 
Taylor expansion: 


glut, Bea) © glut, i) + Gi Gea — He) 
h(x) h(i) + Hi (xt — pu) 


2 


Here G, and H, are the Jacobians of g and h at jj. and ji, respectively: 


G = g' (ut, pai) 
A, = K(i) 


These definitions are equivalent to those in the EKF. The prediction step is 
now derived from lines 2 and 3 of the EKF algorithm (Table 3.3), which are 
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restated here: 
5 = Gi 3a GT 十 Ri 
Bt = guai) 


Substituting X, ; by Q7}; and ji; by 0; ! £, gives us the prediction equations 
of the extended information filter: 


Q, (Gi OF, GF + Ro)! 
& = % glue, 0l &-1) 


The measurement update is derived from Equations (3.60) and (3.61). In par- 
ticular, (3.61) defines the following Gaussian posterior: 


bel(xe) = n exp{—} (z — (ie) — He (a — fe)” Qi 


(2 — h(is) — He (21 — fi)) 7 5 (ae — fe) X (ae — fe) } 


Multiplying out the exponent and reordering the terms gives us the follow- 
ing expression for the posterior: 





bel(r) =  exp{—3 z; Hi Qj. Him v; Ay QV. [e — h(a) + Hi fe] 
-isl Sy) a, tal Sy" fie} 
= 9 exp{—3a¢ [Hf er H+ X] n 


vx [HF Qr [e hhe) + Hip] + De? pe] 


With X; ! = Qu this expression resolves to the following information form: 
bel(z;) = m exp { 一 亏 iT [HF oum + 9] Lt 
cal [HF Qi! [ze — h(jis) + Hi js] + &l 


We can now read off the measurement update equations by collecting the 
terms in the squared brackets: 


ù% = Q&Q-HIQ;H, 
& = &4HLQ;! [u — h(py) + Hs fu] 


Practical Considerations 


When applied to robotics problems, the information filter possesses several 
advantages over the Kalman filter. For example, representing global un- 
certainty is simple in the information filter: simply set 2 = 0. When us- 
ing moments, such global uncertainty amounts to a covariance of infinite 
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magnitude. This is especially problematic when sensor measurements carry 
information about a strict subset of all state variables, a situation often en- 
countered in robotics. Special provisions have to be made to handle such 
situations in EKFs. The information filter tends to be numerically more sta- 
ble than the Kalman filter in many of the applications discussed later in this 
book. 

As we shall see in later chapters of this book, the information filter and sev- 
eral extensions enable a robot to integrate information without immediately 
resolving it into probabilities. This can be of great advantage in complex 
estimation problems, involving hundreds or even millions of variables. For 
such large problems, the integration á la Kalman filter induces severe compu- 
tational problems, since any new piece of information requires propagation 
through a large system of variables. The information filter, with appropriate 
modification, can side-step this issue by simply adding the new information 
locally into the system. However, this is not a property yet of the simple 
information filter discussed here; we will extend this filter in Chapter 12. 

Another advantage of the information filter over the Kalman filter arises 
from its natural fit for multi-robot problems. Multi-robot problems often in- 
volve the integration of sensor data collected decentrally. Such integration is 
commonly performed through Bayes rule. When represented in logarithmic 
form, Bayes rule becomes an addition. As noted above, the canonical param- 
eters of information filters represent a probability in logarithmic form. Thus, 
information integration is achieved by summing up information from mul- 
tiple robots. Addition is commutative. Because of this, information filters 
can often integrate information in arbitrary order, with arbitrary delays, and 
in a completely decentralized manner. While the same is possible using the 
moments parameterization—after all, they represent the same information— 
the necessary overhead for doing so is much higher. Despite this advantage, 
the use of information filters in multi-robot systems remains largely under- 
explored. We will revisit the multi-robot topic in Chapter 12. 

These advantages of the information filter are offset by important limita- 
tions. A primary disadvantage of the extended information filter is the need 
to recover a state estimate in the update step, when applied to nonlinear sys- 
tems. This step, if implemented as stated here, requires the inversion of the 
information matrix. Further matrix inversions are required for the prediction 
step of the information filters. In many robotics problems, the EKF does not 
involve the inversion of matrices of comparable size. For high dimensional 
state spaces, the information filter is generally believed to be computation- 
ally inferior to the Kalman filter. In fact, this is one of the reasons why the 


»js.on 000000 


MARKOV RANDOM 
FIELD 


3.6 


3.6 Summary 79 


EKF has been vastly more popular than the extended information filter. 

As we will see later in this book, these limitations do not necessarily apply 
to problems in which the information matrix possesses structure. In many 
robotics problems, the interaction of state variables is local; as a result, the 
information matrix may be sparse. Such sparseness does not translate to 
sparseness of the covariance. 

Information filters can be thought of as graphs, where states are connected 
whenever the corresponding off-diagonal element in the information matrix 
is non-zero. Sparse information matrices correspond to sparse graphs; in 
fact, such graphs are commonly known as Gaussian Markov random fields. A 
flurry of algorithms exist to perform the basic update and estimation equa- 
tions efficiently for such fields, under names like loopy belief propagation. In 
this book, we will encounter a mapping problem in which the information 
matrix is (approximately) sparse, and develop an extended information fil- 
ter that is significantly more efficient than both Kalman filters and non-sparse 
information filters. 


Summary 


In this section, we introduced efficient Bayes filter algorithms that represent 
the posterior by multivariate Gaussians. We noted that 


* Gaussians can be represented in two different ways: The moments param- 
eterization and the canonical parameterization. The moments parameter- 
ization consists of the mean (first moment) and the covariance (second 
moment) of the Gaussian. The canonical, or natural, parameterization 
consists of an information matrix and an information vector. Both param- 
eterizations are duals of each other, and each can be recovered from the 
other via matrix inversion. 


* Bayes filters can be implemented for both parameterizations. When using 
the moments parameterization, the resulting filter is called Kalman filter. 
The dual of the Kalman filter is the information filter, which represents 
the posterior in the canonical parameterization. Updating a Kalman fil- 
ter based on a control is computationally simple, whereas incorporating 
a measurement is more difficult. The opposite is the case for the infor- 
mation filter, where incorporating a measurement is simple, but updating 
the filter based on a control is difficult. 


* For both filters to calculate the correct posterior, three assumptions have 
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to be fulfilled. First, the initial belief must be Gaussian. Second, the state 
transition probability must be composed of a function that is linear in 
its argument with added independent Gaussian noise. Third, the same 
applies to the measurement probability. It must also be linear in its argu- 
ment, with added Gaussian noise. Systems that meet these assumptions 
are called linear Gaussian systems. 


Both filters can be extended to nonlinear problems. One technique de- 
scribed in this chapter calculates a tangent to the nonlinear function. Tan- 
gents are linear, making the filters applicable. The technique for finding 
a tangent is called Taylor expansion. Performing a Taylor expansion in- 
volves calculating the first derivative of the target function, and evaluat- 
ing it at a specific point. The result of this operation is a matrix known as 
the Jacobian. The resulting filters are called ‘extended.’ 


The unscented Kalman filter uses a different linearization technique, 
called unscented transform. It probes the function to be linearized at se- 
lected points and calculates a linearized approximation based on the out- 
comes of these probes. This filter can be implemented without the need 
for any Jacobians, it is thus often referred to as derivative-free. The un- 
scented Kalman filter is equivalent to the Kalman filter for linear systems 
but often provides improved estimates for nonlinear systems. The com- 
putational complexity of this filter is the same as for the extended Kalman 
filter. 


The accuracy of Taylor series expansions and unscented transforms de- 
pends on two factors: The degree of nonlinearity in the system, and the 
width of the posterior. Extended filters tend to yield good results if the 
state of the system is known with relatively high accuracy, so that the re- 
maining covariance is small. The larger the uncertainty, the higher the 
error introduced by the linearization. 


One of the primary advantages of Gaussian filters is computational: The 
update requires time polynomial in the dimensionality of the state space. 
This is not the case of some of the techniques described in the next chap- 
ter. The primary disadvantage is their confinement to unimodal Gaussian 
distributions. 


An extension of Gaussians to multimodal posteriors is known as multi- 
hypothesis Kalman filter. This filter represents the posterior by a mixture 
of Gaussians, which is nothing else but a weighted sum of Gaussians. 
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The mechanics of updating this filter require mechanisms for splitting and 
fusing or pruning individual Gaussians. Multi-hypothesis Kalman filters 
are particularly well suited for problems with discrete data association, 
which commonly occur in robotics. 


* Within the multivariate Gaussian regime, both filters, the Kalman filter 
and the information filter, have orthogonal strengths and weaknesses. 
However, the Kalman filter and its nonlinear extension, the extended 
Kalman filter, are vastly more popular than the information filter. 


The selection of the material in this chapter is based on today’s most popular 
techniques in robotics. There exists a huge number of variations and exten- 
sions of Gaussian filters, which address the various limitations and short- 
comings of the individual filters. 

A good number of algorithms in this book are based on Gaussian fil- 
ters. Many practical robotics problems require extensions that exploit sparse 
structures or factorizations of the posterior. 


Bibliographical Remarks 


The Kalman filter was invented by Swerling (1958) and Kalman (1960). It is usually introduced 
as an optimal estimator under the least-squares assumption, and less frequently as a method for 
calculating posterior distributions—although under the appropriate assumptions both views 
are identical. There exists a number of excellent textbooks on Kalman filters and information 
filters, including the ones by Maybeck (1990) and Jazwinsky (1970). Contemporary treatments 
of Kalman filters with data association are provided by Bar-Shalom and Fortmann (1988); Bar- 
Shalom and Li (1998). 

The inversion lemma can be found in Golub and Loan (1986). Matrix inversion can be car- 
ried out in O(n2.376) time, according to Coppersmith and Winograd (1990). This result is the 
most recent one in a series of papers that provided improvements over the O(n?) complexity of 
the variable elimination algorithm. The series started with Strassen’s (1969) seminal paper, in 
which he gave an algorithm requiring O(n2?:897). Cover and Thomas (1991) provides a survey 
of information theory, but with a focus on discrete systems. The unscented Kalman filter is due 
to Julier and Uhlmann (1997). A comparison of UKF to the EKF in the context of various state 
estimation problems can be found in van der Merwe (2004). Minka (2001) provided a recent 
treatment of moments matching and assumed density filtering for Gaussian mixtures. 


Exercises 
1. In this and the following exercise, you are asked to design a Kalman filter 


for a simple dynamical system: a car with linear dynamics moving in a 
linear environment. Assume At = 1 for simplicity. The position of the 
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car at time t is given by x+. Its velocity is t, and its acceleration is 2. 
Suppose the acceleration is set randomly at each point in time, according 
to a Gaussian with zero mean and covariance c? = 1. 


(a) What is a minimal state vector for the Kalman filter (so that the result- 
ing system is Markovian)? 


(b) For your state vector, design the state transition probability p(x, | 
ui,24 1). Hint: this transition function will possess linear matrices A 
and B and a noise covariance R (c.f., Equation (3.4) and Table 3.1). 


(c) Implement the state prediction step of the Kalman filter. Assuming we 
know at time t = 0, zo = to = ig = 0. Compute the state distributions 
for times t — 1,2,...,5. 








(d) For each value of t, plot the joint posterior over x and z in a diagram, 
where « is the horizontal and « is the vertical axis. For each posterior, 
UNCERTAINTY ELLIPSE you are asked to plot an uncertainty ellipse, which is the ellipse of points 
that are one standard deviation away from the mean. Hint: If you do 
not have access to a mathematics library, you can create those ellipses 
by analyzing the eigenvalues of the covariance matrix. 


(e) What will happen to the correlation between x; and t as t 1 oo? 


2. We will now add measurements to our Kalman filter. Suppose at time 
t, we can receive a noisy observation of x. In expectation, our sensor 
measures the true location. However, this measurement is corrupted by 
Gaussian noise with covariance c? — 10. 


(a) Define the measurement model. Hint: You need to define a matrix C 
and another matrix Q (c.f., Equation (3.6) and Table 3.1). 


(b) Implement the measurement update. Suppose at time t = 5, we ob- 
serve a measurement z — 5. State the parameters of the Gaussian es- 
timate before and after updating the KF. Plot the uncertainty ellipse 
before and after incorporating the measurement (see above for instruc- 
tions as to how to plot an uncertainty ellipse). 


3. In Chapter 3.2.4, we derived the prediction step of the KF. This step is 
often derived with Z transforms or Fourier transforms, using the Convo- 
lution Theorem. Re-derive the prediction step using transforms. Notice: 
This exercise requires knowledge of transforms and convolution, which goes be- 
yond the material in this book. 
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4. We noted in the text that the EKF linearization is an approximation. To 
see how bad this approximation is, we ask you to work out an example. 
Suppose we have a mobile robot operating in a planar environment. Its 
state is its x-y-location and its global heading direction 0. Suppose we 
know z and y with high certainty, but the orientation 0 is unknown. This 
is reflected by our initial estimate 


H 


(a) 


(b) 


(c) 


(d) 


(e) 


= (0 0 0) and X= 0 001 0 
O 0 10000 


Draw, graphically, your best model of the posterior over the robot pose 
after the robot moves d = 1 units forward. For this exercise, we assume 
the robot moves flawlessly without any noise. Thus, the expected lo- 
cation of the robot after motion will be 





x' x + cos Ó 
y = y+sind 
0! 0 


For your drawing, you can ignore 0 and only draw the posterior in 
x-y-coordinates. 


Now develop this motion into a prediction step for the EKF. For that, 
you have to define a state transition function and linearize it. You then 
have to generate a new Gaussian estimate of the robot pose using the 
linearized model. You should give the exact mathematical equations 
for each of these steps, and state the Gaussian that results. 


Draw the uncertainty ellipse of the Gaussian and compare it with your 
intuitive solution. 


Now incorporate a measurement. Our measurement shall be a noisy 
projection of the x-coordinate of the robot, with covariance Q = 0.01. 
Specify the measurement model. Now apply the measurement both 
to your intuitive posterior, and formally to the EKF estimate using the 
standard EKF machinery. Give the exact result of the EKF, and compare 
it with the result of your intuitive analysis. 


Discuss the difference between your estimate of the posterior, and the 
Gaussian produced by the EKF. How significant are those differences? 
What can be changed to make the approximation more accurate? What 
would have happened if the initial orientation had been known, but 
not the robot's y-coordinate? 
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5. The Kalman filter Table 3.1 lacked a constant additive term in the mo- 
tion and the measurement models. Extend this algorithm to contain such 
terms. 


6. Prove (via example) the existence of a sparse information matrix in multi- 
variate Gaussians (of dimension d) that correlate all d variables with cor- 
relation coefficient that are e-close to 1. We say an information matrix 
is sparse if all but a constant number of elements in each row and each 
column are zero. 
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A popular alternative to Gaussian techniques are nonparametric filters. Non- 
parametric filters do not rely on a fixed functional form of the posterior, such 
as Gaussians. Instead, they approximate posteriors by a finite number of 
values, each roughly corresponding to a region in state space. Some non- 
parametric Bayes filters rely on a decomposition of the state space, in which 
each such value corresponds to the cumulative probability of the posterior 
density in a compact subregion of the state space. Others approximate the 
state space by random samples drawn from the posterior distribution. In all 
cases, the number of parameters used to approximate the posterior can be 
varied. The quality of the approximation depends on the number of param- 
eters used to represent the posterior. As the number of parameters goes to 
infinity, nonparametric techniques tend to converge uniformly to the correct 
posterior—under specific smoothness assumptions. 

This chapter discusses two nonparametric approaches for approximating 
posteriors over continuous spaces with finitely many values. The first de- 
composes the state space into finitely many regions, and represents the pos- 
terior by a histogram. A histogram assigns to each region a single cumulative 
probability; they are best thought of as piecewise constant approximations to 
a continuous density. The second technique represents posteriors by finitely 
many samples. The resulting filter is known as particle filter and has become 
immensely popular in robotics. 

Both types of techniques, histograms and particle filters, do not make 
strong parametric assumptions on the posterior density. In particular, they 
are well-suited to represent complex multimodal beliefs. For this reason, 
they are often the method of choice when a robot has to cope with phases 
of global uncertainty, and when it faces hard data association problems that 
yield separate, distinct hypotheses. However, the representational power of 
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these techniques comes at the price of added computational complexity. 

Fortunately, both nonparametric techniques described in this chapter 
make it possible to adapt the number of parameters to the (suspected) com- 
plexity of the posterior. When the posterior is of low complexity (e.g., fo- 
cused on a single state with a small margin of uncertainty), they use only 
small numbers of parameters. For complex posteriors, e.g., posteriors with 
many modes scattered across the state space, the number of parameters 
grows larger. 

Techniques that can adapt the number of parameters to represent the pos- 
terior online are called adaptive. They are called resource-adaptive if they can 
adapt based on the computational resources available for belief computation. 
Resource-adaptive techniques play an important role in robotics. They en- 
able robots to make decisions in real time, regardless of the computational 
resources available. Particle filters are often implemented as a resource- 
adaptive algorithm, by adapting the number of particles online based on the 
available computational resources. 


The Histogram Filter 


Histogram filters decompose the state space into finitely many regions and 
represent the cumulative posterior for each region by a single probability 
value. When applied to finite spaces, such filters are known as discrete Bayes 
filters; when applied to continuous spaces, they are commonly called his- 
togram filters. We will first describe the discrete Bayes filter and then discuss 
its use in continuous state spaces. 


The Discrete Bayes Filter Algorithm 


Discrete Bayes filters apply to problems with finite state spaces, where the ran- 
dom variable X, can take on finitely many values. We already encountered a 
discrete Bayes filter in Chapter 2.4.2, when discussing the example of a robot 
estimating the probability that a door is open. Some of the robotic mapping 
problems discussed in later chapters also involve discrete random variables. 
For example, occupancy grid mapping algorithms assume that each location 
in the environment is either occupied or free. The corresponding random 
variable is binary. It can take on two different values. Thus, finite state spaces 
play an important role in robotics. 

Table 4.1 provides pseudo-code for the discrete Bayes filter. This code is 
derived from the general Bayes filter in Table 2.1 by replacing the integration 
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de Algorithm Discrete Bayes filter((p; ; 1], we, z+): 
2- for all k do 

3: Pkt = So (Xi = Tp | Ut, Xii = vi) Dita 
4: Pkt = 1 P(2 | Xt = zx) Pi 

5: endfor 

6: return {pp t} 











Table 4.1 The discrete Bayes filter. Here x;, £k denote individual states. 


with a finite sum. The variables zi and zk denote individual states, of which 
there may only be finitely many. The belief at time t is an assignment of a 
probability to each state ;,, denoted p; ,. Thus, the input to the algorithm is 
a discrete probability distribution {px t}, along with the most recent control 
u, and measurement z,. Line 3 calculates the prediction, the belief for the 
new state based on the control alone. This prediction is then updated in line 
4, so as to incorporate the measurement. The discrete Bayes filter algorithm 
is popular in many areas of signal processing, where it is often referred to as 
the forward pass of a hidden Markov model, or HMM. 


Continuous State 


Of particular interest will be the use of discrete Bayes filters as an approxi- 
mate inference tool for continuous state spaces. As noted above, such filters 
are called histogram filters. Figure 4.1 illustrates how a histogram filter rep- 
resents a random variable and its nonlinear transform. Shown there is the 
projection of a histogrammed Gaussian through a nonlinear function. The 
original Gaussian distribution possesses 10 bins. So does the projected prob- 
ability distribution, but in two of the resulting bins the probability is so close 
to zero that they cannot be seen in this figure. Figure 4.1 also shows the 
correct continuous distributions for comparison. 

Histogram filters decompose a continuous state space into finitely many 
bins, or regions: 


dom(X,) = Xj, U X2 Usni XKt 
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Figure 4.1 Histogram representation of a continuous random variable. The gray 
shaded area in the lower right plot shows the density of the continuous random vari- 
able, X. The histogram approximation of this density is overlaid in light-gray. The 
random variable is passed through the function displayed in the upper right graph. 
The density and the histogram approximation of the resulting random variable, Y, 
are plotted in the upper left graph. The histogram of the transformed random vari- 
able was computed by passing multiple points from each histogram bin of X through 
the nonlinear function. 




















p(x) 














x 





Here X, is the familiar random variable describing the state of the robot at 
time t. The function dom(X;) denotes the state space, which is the universe 
of possible values that X, might assume. Each xk,t describes a convex region. 
These regions together form a partitioning of the state space. For each i Z k 
we have x; N Xx, = Ü and [Jj Xk, = dom(X;). 

A straightforward decomposition of a continuous state space is a multi- 
dimensional grid, where each x; is a grid cell. Through the granularity of 
the decomposition, we can trade off accuracy and computational efficiency. 
Fine-grained decompositions infer smaller approximation errors than coarse 
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ones, but at the expense of increased computational complexity. 

As we already discussed, the discrete Bayes filter assigns to each region 
Xy; a probability, pz. Within each region, the discrete Bayes filter carries no 
further information on the belief distribution. Thus, the posterior becomes a 
piecewise constant PDF, which assigns a uniform probability to each state x; 
within each region xy, ;: 





plz) = 


Here |x;,;| is the volume of the region x; ¢. 

If the state space is truly discrete, the conditional probabilities p(xi | 
Ut, Xit—-1) and D(2 | Xķ,t) are well-defined, and the algorithm can be im- 
plemented as stated. In continuous state spaces, one is usually given the 
densities p(x; | ut, z;11) and p(z | zi), which are defined for individual 
states (and not for regions in state space). For cases where each region xx + 
is small and of the same size, these densities are usually approximated by 
substituting xj; by a representative of this region. For example, we might 
simply “probe” using the mean state in xz 4 


tht = bul f v, da, 
Xk,t 


One then simply replaces 
Pli | Xk) ~ plz | ĉr) 
P(Xk,t | Ut, Xit-1) = N xil p(2x; | Ue, it-1) 


These approximations are the result of the piecewise uniform interpretation 
of the discrete Bayes filter stated in (4.2), and a Taylor-approximation analo- 
gous to the one used by EKFs. 


Mathematical Derivation of the Histogram Approximation 


To see that (4.4) is a reasonable approximation, we note that p(z, | xz.) can 
be expressed as the following integral: 


p(zi, Xk,t) 
P(Xx,t) 


7 D(Zt, Lt) dai 
f p(zi) da 


plz | Xk,t) 
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人 ze | x+) p(x) dae 


[reed dar 
Xk,t 
k, 
f plz | Lt) Pk da; 
(4.2) Xk,t [xxl 
/ Pk,t dz, 
Xk,t [Xk] 


plz | Lt) da; 














Pk,t 


[Xk] Xk,t 


Pk,t 
xi. «| / 1 daz 
Xk,t 
f p(z | £+) dai 
Xk,t 


/ 1 dax, 
Xk,t 


= eget f ste dis 
Xk,t 














This expression is an exact description of the desired probability under the 
piecewise uniform distribution model in (4.2). If we now approximate p(z: | 
xt) by p(z | ĉr, t) for x, € xx; we obtain 


(&7  »(m|xk) 8 [Xk J p(zi | x,t) doi 
Xk,t 


[xd plz | & f 1 dz, 
Xk,t 


[Xx 7! p(z | $4) |Xx,tl 


plz $1) 





which is the approximation stated above in (4.4). 

The derivation of the approximation to p(x;,; | Uz, Xi, 1) in (4.5) is slightly 
more involved, since regions occur on both sides of the conditioning bar. In 
analogy to our transformation above, we obtain: 


(4.8) p(xi, | Ut, Xi,t-1) 
D(X Meet | ut) 


P(Xi,t-1 | ut) 
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is ji p(zi, Lt 1 | ut) dai, dag i 
ca 1 





da. » p(zi-i | Ut) dzia 


ri J p(zi | Ut, 4-1) p(zia | ut) dat, da, i 
Xk,t Xi, t—1 





Jaret p(zi-i | Ut) dz 


We now exploit the Markov assumption, which implies independence be- 
tween z,..; and uz, and thus p(zi 1 | ui) = p(zi-i): 


(4.9) p(Xx, t | Ut, Xi,t— 1) 


2 J p(xe | Ut, Le 1) p(zi- 1) dz, dai 
Xi,t—1 





f D(Zt-1) dxy—1 
Xi, t—1 





Pi,t— 
i / p(ae | uc 21-1) — = d£, d£t—1 
Xk,t Y Xi, t—1 [xi ii 








T Pi, t—1 ds 
Xi,t—1 [xi i-i] 


| / p(zi | Ut, 24-1). day, d£t—1 
Xk,t X; -i 





/ 1 dz, 1 
Xi, t—1 


bil? f J p(zi | ut, 24-1) dai, dag 1 
Xk,t Xi, ,t—1 


If we now approximate p(x, | ur, £t—1) by p(2y1 | Ut, ĉit—1) as before, we 
obtain the following approximation. Note that the normalizer 7 becomes 
necessary to ensure that the approximation is a valid probability distribution: 


(4.10) p(Xx, t | Ut, Xi t— 1) 


a2 
2 


7] 


7] 


7] 
n 





Xi t— al J J DU] Ut, Êi t- 1) dz, dz, 
Xi, ,t—1 


Xii | plêk, | un tier) | f 1 dx, d£t—1 
Xk,t Xi,t—1 


Xiti t D(2y a | Ut, ĉi t-1)|Xk tl [Xi t-11] 


Xy 4| Plêk,t | Ut, ĉit—1) 


If all regions are of equal size (meaning that |x% +| is the same for all k), we 
can simply omit the factor |x% +|, since it is subsumed by the normalizer. The 
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Figure 4.2 Dynamic vs. static decomposition. The upper left graph shows the static 
histogram approximation of the random variable Y, using 10 bins for covering the 
domain of Y (of which 6 are of nearly zero probability). The upper middle graph 
presents a tree representation of the same random variable, using the same number 
of bins. 


resulting discrete Bayes filter is then equivalent to the algorithm outlined in 
Table 4.1. If implemented as stated there, the auxiliary parameters py do not 
constitute a probability distribution, since they are not normalized (compare 
line 3 to (4.10). However, normalization takes place in line 4, so that the 
output parameters are indeed a valid probability distribution. 


Decomposition Techniques 


In robotics, decomposition techniques of continuous state spaces come in 
two basic flavors: static and dynamic. Static techniques rely on a fixed decom- 
position that is chosen in advance, irrespective of the shape of the posterior 
that is being approximated. Dynamic techniques adapt the decomposition to 
the specific shape of the posterior distribution. Static techniques are usually 
easier to implement, but they can be wasteful with regards to computational 
resources. 

A primary example of a dynamic decomposition technique is the family 
of density trees. Density trees decompose the state space recursively, in ways 
that adapt the resolution to the posterior probability mass. The intuition be- 
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hind this decomposition is that the level of detail in the decomposition is 
a function of the posterior probability: The less likely a region, the coarser 
the decomposition. Figure 4.2 illustrates the difference between a static grid 
representation and a density tree representation. Due to its more compact 
representation, the density tree achieves a higher approximation quality us- 
ing the same number of bins. Dynamic techniques like density trees can often 
cut the computation complexity by orders of magnitude over static ones, yet 
they require additional implementation effort. 

An effect similar to that of dynamic decompositions can be achieved by 
selective updating. When updating a posterior represented by a grid, selective 
techniques update a fraction of all grid cells only. A common implementation 
of this idea updates only those grid cells whose posterior probability exceeds 
a user-specified threshold. 

Selective updating can be viewed as a hybrid decomposition, which de- 
composes the state space into a fine-grained grid and one large set that con- 
tains all regions not chosen by the selective update procedure. In this light, it 
can be thought of as a dynamic decomposition technique, since the decision 
as to which grid cells to consider during the update is made online, based 
on the shape of the posterior distribution. Selective updating techniques can 
reduce the computational effort involved in updating beliefs by orders of 
magnitude. They make it possible to use grid decompositions in spaces of 
three or more dimensions. 

The mobile robotics literature often distinguishes topological from metric 
representations of space. While no clear definition of these terms exist, topo- 
logical representations are often thought of as coarse graph-like representa- 
tions, where nodes in the graph correspond to significant places (or features) 
in the environment. For indoor environments, such places may correspond 
to intersections, T-junctions, dead ends, and so on. The resolution of such 
decompositions, thus, depends on the structure of the environment. Alter- 
natively, one might decompose the state space using regularly-spaced grids. 
Such a decomposition does not depend on the shape and location of the en- 
vironmental features. Grid representations are often thought of as metric 
although, strictly speaking, it is the embedding space that is metric, not the 
decomposition. In mobile robotics, the spatial resolution of grid representa- 
tions tends to be higher than that of topological representations. For instance, 
some of the examples in Chapter 7 use grid decompositions with cell sizes 
of 10 centimeters or less. This increased accuracy comes at the expense of 
increased computational costs. 
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1: Algorithm binary_Bayes_filter(/;_;, z;): 
2: ly = l1 + log PEEL — log PO 
3: return l 











Table 4.2 The binary Bayes filter in log odds form with an inverse measurement 
model. Here /; is the log odds of the posterior belief over a binary state variable that 
does not change over time. 


Binary Bayes Filters with Static State 


Certain problems in robotics are best formulated as estimation problems with 
binary state that does not change over time. Those problems are addressed 
by the binary Bayes filter. Problems of this type arise if a robot estimates a 
fixed binary quantity in the environment from a sequence of sensor mea- 
surements. For example, a robot might want to know if a door is open or 
closed, in a context where the door state does not change during sensing. 
Another example of binary Bayes filters with static state are occupancy grid 
maps, which we will encounter in Chapter 9. 

When the state is static, the belief is a function only of the measurements: 


bel; (x) z p(x | Zi Us) = p(x | zi) 


where the state is chosen from two possible values, denoted by x and ~g. In 
particular, we have bel;(^r) = 1 — bel;(x). The lack of a time index for the 
state x reflects the fact that the state does not change. 

Naturally, binary estimation problems of this type can be tackled using 
the discrete Bayes filter in Table 4.1. However, the belief is commonly imple- 
mented as a log odds ratio. The odds of a state x is defined as the ratio of the 
probability of this event divided by the probability of its negate 


pr) _ p) 
pow) ^ 1-p(o) 





The log odds is the logarithm of this expression 
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Log odds assume values from —oo to oo. The Bayes filter for updating beliefs 
in log odds representation is computationally elegant. It avoids truncation 
problems that arise for probabilities close to 0 or 1. 

Table 4.2 states the basic update algorithm. This algorithm is additive; in 
fact, any algorithm that increments and decrements a variable in response 
to measurements can be interpreted as a Bayes filter in log odds form. This 
binary Bayes filter uses an inverse measurement model p(x | z;), instead of the 
familiar forward model p(z; | x). The inverse measurement model specifies a 
distribution over the (binary) state variable as a function of the measurement 
Zt. 

Inverse models are often used in situations where measurements are more 
complex than the binary state. An example of such a situation is the problem 
of estimating whether or not a door is closed from camera images. Here the 
state is extremely simple, but the space of all measurements is huge. It is 
easier to devise a function that calculates a probability of a door being closed 
from a camera image, than describing the distribution over all camera images 
that show a closed door. In other words, it is easier to implement an inverse 
than a forward sensor model. 

As the reader easily verifies from our definition of the log odds (4.13), the 
belief bel;(x) can be recovered from the log odds ratio / by the following 
equation: 


1 


bel —— lr 
elim) 1+ expli] 


To verify the correctness of our binary Bayes filter algorithm, we briefly re- 
state the basic filter equation with the Bayes normalizer made explicit: 


p(z: | £, 214-1) DZ | 214-1) 
p(z: | z14—1) 
p(z | x) p(z | zii) 
pe | 9183) 





P(t | at) = 





We now apply Bayes rule to the measurement model p(z; | x): 


p(x | z) p(z) 


Peels) = C 
and obtain 
p(x | zia) p(x | 2) p(z) pla | Aui) 





p(x) p(z: | z14-1) 
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(4.19) 


(4.20) 


(4.21) 
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By analogy, we have for the opposite event =z: 


pOT | 2%) pla) pz | 214-1) 
p(z) p(z | 214-1) 





p(^x | zu) 


Dividing (4.17) by (4.18) leads to cancellation of various difficult-to-calculate 
probabilities: 


pr|za) _ prm) pE |z) poe) 
pOr | zi) pz|l2 pOr | z1z-1) p(z) 
p(x | zi) p(x | 212-1)  1—p(e) 
l1—-p(r|z)l1—-p(z|za-i) p(z) 
We denote the log odds ratio of the belief bel;(x) by l;(z). The log odds belief 
at time t is given by the logarithm of (4.19). 














Dogg PEDE) og Pene) Qu Lob) 
L(x) = log PEST, 十 log Doa log Ae) 

= o p(x | zi) - p(x) 5 

p Ee ae 1a) eG) + la ) 


Here p(x) is the prior probability of the state x. As in (4.20), each measure- 
ment update involves the addition of the prior (in log odds form). The prior 
also defines the log odds of the initial belief before processing any sensor 
measurement: 


p(x) 
1 — p(x) 





lo(z) = log 


The Particle Filter 


Basic Algorithm 


The particle filter is an alternative nonparametric implementation of the Bayes 
filter. Just like histogram filters, particle filters approximate the posterior by 
a finite number of parameters. However, they differ in the way these param- 
eters are generated, and in which they populate the state space. The key idea 
of the particle filter is to represent the posterior bel(x;) by a set of random 
state samples drawn from this posterior. Figure 4.3 illustrates this idea for 
a Gaussian. Instead of representing the distribution by a parametric form— 
which would have been the exponential function that defines the density 
of a normal distribution—particle filters represent a distribution by a set of 
samples drawn from this distribution. Such a representation is approximate, 
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Figure 4.3 The “particle” representation used by particle filters. The lower right 
graph shows samples drawn from a Gaussian random variable, X. These samples are 
passed through the nonlinear function shown in the upper right graph. The resulting 
samples are distributed according to the random variable Y. 


but it is nonparametric, and therefore can represent a much broader space of 
distributions than, for example, Gaussians. Another advantage of the sam- 
ple based representation is its ability to model nonlinear transformations of 
random variables, as shown in Figure 4.3. 

In particle filters, the samples of a posterior distribution are called particles 
and are denoted 


Ap = alt, eas ol 

Each particle al” (with 1 < m < M) is a concrete instantiation of the state 
at time t. Put differently, a particle is a hypothesis as to what the true world 
state may be at time t. Here M denotes the number of particles in the particle 
set X. In practice, the number of particles M is often a large number, e.g., 
M = 1,000. In some implementations M is a function of t or other quantities 
related to the belief bel(x;). 
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1: Algorithm Particle filter(AX, .;, uz, z+): 
2: Xx, = Xi = 0 

3: form = 1 to M do 

4: sample x"! ~ p(x; | uw, 2!) 
5 wp = p(ze | ap) 

6: A, = Æe + Gl, ui) 

7: endfor 

8: form — 1 to M do 

9: draw i with probability cx ma 
10: add x” to x, 

11: endfor 

12: return A; 











Table 4.3 The particle filter algorithm, a variant of the Bayes filter based on impor- 
tance sampling. 


The intuition behind particle filters is to approximate the belief bel(x;) by 
the set of particles X;. Ideally, the likelihood for a state hypothesis x; to be 
included in the particle set 1; shall be proportional to its Bayes filter posterior 
bel(xz): 


xl" a p(zi | Zits U1:t) 


As a consequence of (4.23), the denser a subregion of the state space is popu- 
lated by samples, the more likely it is that the true state falls into this region. 
As we will discuss below, the property (4.23) holds only asymptotically for 
M 1 oo for the standard particle filter algorithm. For finite M, particles are 
drawn from a slightly different distribution. In practice, this difference is 
negligible as long as the number of particles is not too small (e.g., M > 100). 

Just like all other Bayes filter algorithms discussed thus far, the particle fil- 
ter algorithm constructs the belief bel (x+) recursively from the belief bel (x. .1) 
one time step earlier. Since beliefs are represented by sets of particles, this 
means that particle filters construct the particle set X, recursively from the 
set AX, 1. 

The most basic variant of the particle filter algorithm is stated in Table 4.3. 
The input of this algorithm is the particle set 4. .;, along with the most re- 
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cent control u; and the most recent measurement z+. The algorithm then first 
constructs a temporary particle set V that represented the belief bel(z,). It 


does this by systematically processing each particle Eu in the input particle 
set 4, ,. Subsequently, it transforms these particles into the set A,, which 
approximates the posterior distribution bel (z;). In detail: 


1. 


2. 


[m] 


Line 4 generates a hypothetical state x; 
oll and the control u+. The resulting sample is indexed by m, indicat- 
ing that it is generated from the m-th particle in 1;_,. This step involves 
sampling from the state transition distribution p(x; | uz, zi 1). To imple- 
ment this step, one needs to be able to sample from this distribution. The 
set of particles obtained after M iterations is the filter’s representation of 


bel(a;). 


for time t based on the particle 


Line 5 calculates for each particle ol”) the so-called importance factor, de- 
noted wh, Importance factors are used to incorporate the measurement 
2 into the particle set. The importance, thus, is the probability of the 
measurement z, under the particle ol"), given by wl” = p(z | ot”). If 
we interpret wl! as the weight of a particle, the set of weighted particles 
represents (in approximation) the Bayes filter posterior bel(z;). 


The real “trick” of the particle filter algorithm occurs in lines 8 through 
11 in Table 4.3. These lines implemented what is known as resampling 
or importance sampling. The algorithm draws with replacement M parti- 
cles from the temporary set A. The probability of drawing each particle 
is given by its importance weight. Resampling transforms a particle set 
of M particles into another particle set of the same size. By incorporat- 
ing the importance weights into the resampling process, the distribution 
of the particles change: Whereas before the resampling step, they were 
distributed according to bel(x;), after the resampling they are distributed 
(approximately) according to the posterior bel(z;) = n p(z: | al”) Joel (a). 
In fact, the resulting sample set usually possesses many duplicates, since 
particles are drawn with replacement. More important are the particles 
not contained in X;: Those tend to be the particles with lower importance 
weights. 


The resampling step has the important function to force particles back to 
the posterior bel(x;). In fact, an alternative (and usually inferior) version of 
the particle filter would never resample, but instead would maintain for each 


»js.on 000000 


100 


(4.24) 


4.3.2 
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particle an importance weight that is initialized by 1 and updated multiplica- 
tively: 

up) = pfa | ag!) wp 

Such a particle filter algorithm would still approximate the posterior, but 
many of its particles would end up in regions of low posterior probability. 
As a result, it would require many more particles; how many depends on 
the shape of the posterior. The resampling step is a probabilistic implemen- 
tation of the Darwinian idea of survival of the fittest: It refocuses the particle 
set to regions in state space with high posterior probability. By doing so, it 
focuses the computational resources of the filter algorithm to regions in the 
state space where they matter the most. 


Importance Sampling 


For the derivation of the particle filter, it shall prove useful to discuss the 
resampling step in more detail. 

Intuitively, we are faced with the problem of computing an expectation 
over a probability density function f, but we are only given samples gener- 
ated from a different probability density function, g. For example, we might 
be interested in the expectation that x € A. We can express this probability 
as an expectation over g. Here J is the indicator function, which is 1 if its 
argument is true, and 0 otherwise. 


E(x eA] = ] 1e 
= 一 一 g(x) I(a € A) dx 


=w(2) 
= Ejw(x) I(x € A)] 
Here w(x) = zc is a weighting factor that accounts for the “mismatch” 
between f and g. For this equation to be correct, we need f(x) > 0 — 
g(x) > 0. 

The importance sampling algorithm utilizes this transformation. Fig- 
ure 4.4a shows a density function f of a probability distribution, which 
henceforth will be called the target distribution. As before, what we would 
like to achieve is to obtain a sample from f. However, sampling from f di- 
rectly shall be impossible. We instead generate particles from a density g in 
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Figure 4.4 Illustration of importance factors in particle filters: (a) We seek to ap- 
proximate the target density f. (b) Instead of sampling from f directly, we can only 
generate samples from a different density, g. Samples drawn from g are shown at 
the bottom of this diagram. (c) A sample of f is obtained by attaching the weight 
f(x)/g(x) to each sample x. In particle filters, f corresponds to the belief bel(zz) and 


g to the belief bel(a«). 


5rj s. cn 


000000 


102 


PROPOSAL 
DISTRIBUTION 


(4.26) 


(4.27) 


(4.28) 
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Figure 4.4b. The distribution that corresponds to the density g is called pro- 
posal distribution. The density g must be such that f(x) > 0 implies g(x) > 0, 
so that there is a non-zero probability to generate a particle when sampling 
from g for any state that might be generated by sampling from f. However, 
the resulting particle set, shown at the bottom of Figure 4.4b, is distributed 
according to g, not to f. In particular, for any interval A C dom(X) (or more 
generally, any Borel set A) the empirical count of particles that fall into A 
converges to the integral of g under A: 

| 
— I(al™ e 4) — » g(x) dx 

M m=1 A 
To offset this difference between f and g, particles x!) are weighted by the 
quotient 


[m] 
g(x) 
This is illustrated by Figure 4.4c: The vertical bars in this figure indicate 


the magnitude of the importance weights. Importance weights are the non- 
normalized probability mass of each particle. In particular, we have 





M Sb mu 
wil Tale A) ull — f(x) dx 
2E | 


where the first term serves as the normalizer for all importance weights. In 
other words, even though we generated the particles from the density g, 
the appropriately weighted particles converge to the density f. It can be 
shown that under mild conditions, this approximation converges to the de- 
sired E,|I(x € A)] for arbitrary sets A. In most cases, the rate of convergence 
is in OZ where M is the number of samples. The constant factor de- 
pends on the similarity of f(x) and g(x). 

In particle filters, the density f corresponds to the target belief bel(z;). 
Under the (asymptotically correct) assumption that the particles in X;—ı are 
distributed according to bel(x;_1), the density g corresponds to the product 
distribution: 


p(x | Ut, Lt-1) bel(z4—1) 


Once again, this distribution is the proposal distribution. 
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Mathematical Derivation of the PF 


To derive particle filters mathematically, it shall prove useful to think of par- 
ticles as samples of state sequences 


ol abl all, al 

It is easy to modify the algorithm accordingly: Simply append to the particle 
all the sequence of state samples from which it was generated oll |. This 
particle filter calculates the posterior over all state sequences: 


bel(xo:t) = D(Zo: | Urt, zia) 


instead of the belief bel(z;) = p(zi | wiz, 21). Admittedly, the space over 
all state sequences is huge, and covering it with particles is usually not such 
a good idea. However, this shall not deter us here, as this definition serves 
only as the means to derive the particle filter algorithm in Table 4.3. 

The posterior bel(zo.,) is obtained analogously to the derivation of bel(z,) 
in Chapter 2.4.3. In particular, we have 


p(zo: | Zl: uis) 


Bayes 

= n p(z | XO:t; Z1:4—1, U1:t) P(Lo:t | Z1:t-1, U1:t) 
Markov 

二 n p(zi | £t) p(®o:e | z14—1, at) 

= n p(z | Lt) plz | To:t—1; Ž1:t—1; U1:t) P(Xo:t-1 | Z1:t-1, U1:t) 
Markov 

= n p(z | Lt) p(x | Tt—1, Ut) p(£o:+—1 | 214-1) Ul:t—1) 


Notice the absence of integral signs in this derivation, which is the result 
of maintaining all states in the posterior, not just the most recent one as in 
Chapter 2.4.3. 

The derivation is now carried out by induction. The initial condition is 
trivial to verify, assuming that our first particle set is obtained by sampling 
the prior p(zo). Let us assume that the particle set at time t — 1 is distributed 
according to bel(xo:..1). For the m-th particle abri in this set, the sample 


zm generated in Step 4 of our algorithm is generated from the proposal 
distribution: 





P(e | vii, Ue) bel(zo«—1) = p(t | via, Ue) ptos | 214-1, 14-1) 
with 

[m] target distribution 
Ui = 


proposal distribution 
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n p(z | Zt) p(zi | Lt—-1, Ut) D(Zo:t 1 | Bici urs) 
p(zi | Lt—1, Ut) P(Xo:t-1 | Zo:t—1; U0:t—1) 
T n p(z | Lt) 





The constant 7 plays no role since the resampling takes place with prob- 
abilities proportional to the importance weights. By resampling particles 
with probability proportional to w, the resulting particles are indeed dis- 
tributed according to the product of the proposal and the importance weights 


wl) ; 


n wl P(X | Te-1, Ut) P(£o:t—1 | zo4-1:u04-1) = bel(zo4) 
(Notice that the constant factor 7 here differs from the one in (4.34).) The 
algorithm in Table 4.4 follows now from the simple observation that if abr 
is distributed according to bel(xo:+), then the state sample al"! is (trivially) 
distributed according to bel(z,). 

As we will argue below, this derivation is only correct for M f oo, due toa 
laxness in our consideration of the normalization constants. However, even 
for finite M it explains the intuition behind the particle filter. 


Practical Considerations and Properties of Particle Filters 
Density Extraction 


The sample sets maintained by particle filters represent discrete approxima- 
tions of continuous beliefs. Many applications, however, require the avail- 
ability of continuous estimates, that is, estimates not only at the states repre- 
sented by particles, but at any point in the state space. The problem of ex- 
tracting a continuous density from such samples is called density estimation. 
We will only informally illustrate some approaches to density estimation. 

Figure 4.5 illustrates different ways of extracting a density from particles. 
The leftmost graph shows the particles and density of the transformed Gaus- 
sian from our standard example (c.f. Figure 4.3). A simple and highly effi- 
cient approach to extracting a density from such particles is to compute a 
Gaussian approximation, as illustrated by the dashed Gaussian in Figure 4.5(b). 
In this case, the Gaussian extracted from the particles is virtually identical to 
the Gaussian approximation of the true density (solid line). 

Obviously, a Gaussian approximation captures only basic properties of a 
density, and it is only appropriate if the density is unimodal. Multimodal 
sample distributions require more complex techniques such as k-means clus- 
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tering, which approximates a density using mixtures of Gaussians. An alter- 
native approach is illustrated in Figure 4.5(c). Here, a discrete histogram is 
superimposed over the state space and the probability of each bin is com- 
puted by summing the weights of the particles that fall into its range. As 
with histogram filters, an important shortcoming of this technique is the fact 
that the space complexity is exponential in the number of dimensions. On the 
other hand, histograms can represent multi-modal distributions, they can be 
computed extremely efficiently, and the density at any state can be extracted 
in time independent of the number of particles. 

The space complexity of histogram representations can be reduced signif- 
icantly by generating a density tree from the particles, as discussed in Chap- 
ter 4.1.4. However, density trees come at the cost of more expensive lookups 
when extracting the density at any point in the state space (logarithmic in the 
depth of the tree). 

Kernel density estimation is another way of converting a particle set into a 
continuous density. Here, each particle is used as the center of a so-called 
kernel, and the overall density is given by a mixture of the kernel densi- 
ties. Figure 4.5(d) shows such a mixture density resulting from placing a 
Gaussian kernel at each particle. The advantage of kernel density estimates 
is their smoothness and algorithmic simplicity. However, the complexity of 
computing the density at any point is linear in the number of particles, or 
kernels. 

Which of these density extraction techniques should be used in practice? 
This depends on the problem at hand. For example, in many robotics ap- 
plications, processing power is very limited and the mean of the particles 
provides enough information to control the robot. Other applications, such 
as active localization, depend on more complex information about the un- 
certainty in the state space. In such situations, histograms or mixtures of 
Gaussians are a better choice. The combination of data collected by multiple 
robots sometimes requires the multiplication of densities underlying differ- 
ent sample sets. Density trees or kernel density estimates are well suited for 
this purpose. 


Sampling Variance 


An important source of error in the particle filter relates to the variation 
inherent in random sampling. Whenever a finite number of samples is 
drawn from a probability density, statistics extracted from these samples dif- 
fer slightly from the statistics of the original density. For instance, if we draw 
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Figure 4.5 Different ways of extracting densities from particles. (a) Density and 
sample set approximation, (b) Gaussian approximation (mean and variance), (c) his- 
togram approximation, (d) kernel density estimate. The choice of approximation 
strongly depends on the specific application and the computational resources. 


Srjs.cn 000000 


4.3 The Particle Filter 


(a) 25 samples 








ply) 
—— Samples 
Kernel density 

















p(y) 








p(y) 
— Samples 





Kernel density 












M TT 





ply) 








p(y) 
— Samples 
Kernel density 














p(y) 


(b) 250 samples 








p(y) 
— Samples 
Kernel density 




















p(y) 


p(y) 
— Samples 
Kernel density 








p(y) 








p(y) 
— Samples 
Kernel density 


























p(y) 


107 


Figure 4.6 Variance due to random sampling. Samples are drawn from a Gaus- 
sian and passed through a nonlinear function. Samples and kernel estimates result- 
ing from repeated sampling of 25 (left column) and 250 (right column) samples are 
shown. Each row shows one random experiment. 
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samples from a Gaussian random variable, then the mean and variance of 
the samples will differ from the mean and variance of the original random 
variable. Variability due to random sampling is called the variance of the 
sampler. 

Imagine two identical robots with identical, Gaussian beliefs performing 
identical, noise-free actions. Obviously, both robots should have the same 
belief after performing the action. To simulate this situation, we draw sam- 
ples repeatedly from a Gaussian density and pass them through a nonlinear 
transformation. The graphs in Figure 4.6 show the resulting samples and 
their kernel density estimates along with the true belief (gray area). Each 
graph in the upper row results from drawing 25 samples from the Gaussian. 
Contrary to the desired outcome, some of the kernel density estimates dif- 
fer substantially from the true density, and there is a large variability among 
the different kernel densities. Fortunately, the sampling variance decreases 
with the number of samples. The lower row in Figure 4.6 shows typical re- 
sults obtained with 250 samples. Obviously, the higher number of samples 
results in more accurate approximations with less variability. In practice, if 
enough samples are chosen, the observations made by a robot typically keep 
the sample based belief "close enough" to the true belief. 


Resampling 


The sampling variance is amplified through repetitive resampling. To un- 
derstand this problem, it will be useful to consider the extreme case, which 
is that of a robot whose state does not change. Sometimes, we know for a fact 
that x, = x-1. A good example is that of mobile robot localization for a robot 
that does not move. Let us furthermore assume that the robot possesses no 
sensors, hence it cannot estimate the state, and that it is unaware of the state. 
Obviously, such a robot can never find out anything about its location, hence 
the estimate at time t should be identical to its initial estimate, for any point 
in time t. 

Unfortunately, this is not the result of a vanilla particle filters. Initially, 
our particle set will be generated from the prior, and particles will be spread 
throughout the state space. However, the resampling step (line 8 in the al- 
gorithm) will occasionally fail to reproduce a state sample z/"l. Since our 
state transition is deterministic, no new states will be introduced in the for- 
ward sampling step (line 4). As time goes on, more and more particles are 
erased simply due to the random nature of the resampling step, without the 
creation of any new particles. The result is quite daunting: With probability 
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one, M identical copies of a single particle will survive; the diversity will 
disappear due to the repetitive resampling. To an outside observer, it may 
appear that the robot has uniquely determined the world state—an apparent 
contradiction to the fact that the robot possesses no sensors. 

This example hints at another limitation of particle filters with important 
practical ramifications. In particular, the resampling process induces a loss of 
diversity in the particle population, which in fact manifests itself as approx- 
imation error: Even though the variance of the particle set itself decreases, 
the variance of the particle set as an estimator of the true belief increases. 
Controlling this variance, or error, of the particle filter is essential for any 
practical implementation. 

There exist two major strategies for variance reduction. First, one may 
reduce the frequency at which resampling takes place. When the state is 
known to be static (zx; = 2.1) one should never resample. This is the case, 
for example, in mobile robot localization: When the robot stops, resampling 
should be suspended (and in fact it is usually a good idea to suspend the 
integration of measurements as well). Even if the state changes, it is often 
a good idea to reduce the frequency of resampling. Multiple measurements 
can always be integrated via multiplicatively updating the importance fac- 
tor as noted above. More specifically, it maintains the importance weight in 
memory and updates them as follows: 


[m] 


[m] 1 if resampling took place 
Wy i [m] N Š 
plz | xi ) w1 ifno resampling took place 


The choice of when to resample is intricate and requires practical experience: 
Resampling too often increases the risk of losing diversity. If one samples too 
infrequently, many samples might be wasted in regions of low probability. A 
standard approach to determining whether or not resampling should be per- 
formed is to measure the variance of the importance weights. The variance 
of the weights relates to the efficiency of the sample based representation. If 
all weights are identical, then the variance is zero and no resampling should 
be performed. If, on the other hand, the weights are concentrated on a small 
number of samples, then the weight variance is high and resampling should 
be performed. 

The second strategy for reducing the sampling error is known as low vari- 
ance sampling. Table 4.4 depicts an implementation of a low variance sam- 
pler. The basic idea is that instead of selecting samples independently of 
each other in the resampling process (as is the case for the basic particle filter 
in Table 4.3), the selection involves a sequential stochastic process. 
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1: Algorithm Low. variance sampler(X,, V): 
2: à -0 

3: r = rand(0; M7!) 

4: c= wl 

5: i=1 

6: form = 1 to M do 

7: U-r-t(m—-1).M^ 
8: while U > c 

9: i—i-cl 

10: = e+ wl 

11: endwhile 

12: add x!!! to X, 

13: endfor 

14: return X, 








Table 4.4 Low variance resampling for the particle filter. This routine uses a sin- 
gle random number to sample from the particle set 1 with associated weights W, 
yet the probability of a particle to be resampled is still proportional to its weight. 
Furthermore, the sampler is efficient: Sampling M particles requires O( M) time. 


Instead of choosing M random numbers and selecting those particles that 
correspond to these random numbers, this algorithm computes a single ran- 
dom number and selects samples according to this number but still with a 
probability proportional to the sample weight. This is achieved by drawing 
a random number r in the interval [0; M ^!], where M is the number of sam- 
ples to be drawn at time t. The algorithm in Table 4.4 then selects particles by 
repeatedly adding the fixed amount M^! to r and by choosing the particle 
that corresponds to the resulting number. Any number U in [0; 1] points to 
exactly one particle, namely the particle i for which 


j 
i = argmin 5 wl! >U 
了 m=1 
The while loop in Table 4.4 serves two tasks, it computes the sum in the right- 
hand side of this equation and additionally checks whether i is the index 


of the first particle such that the corresponding sum of weights exceeds U. 
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ro rt+M! r+2M... 
Figure 4.7 Principle of the low variance resampling procedure. We choose a random 
number r and then select those particles that correspond to u = r + (m — 1)- M^! 
where m = 1,..., M. 


The selection is then carried out in line 12. This process is also illustrated in 
Figure 4.7. 

The advantage of the low-variance sampler is threefold. First, it covers the 
space of samples in a more systematic fashion than the independent random 
sampler. This should be obvious from the fact that the dependent sampler 
cycles through all particles systematically, rather than choosing them inde- 
pendently at random. Second, if all the samples have the same importance 
factors, the resulting sample set X, is equivalent to X, so that no samples are 
lost if we resample without having integrated an observation into +. Third, 
the low-variance sampler has a complexity of O(M). Achieving the same 
complexity for independent sampling is difficult; obvious implementations 
require a O(log M) search for each particle once a random number has been 
drawn, which results in a complexity of O(M log M) for the entire resam- 
pling process. Computation time is of essence when using particle filters, 
and often an efficient implementation of the resampling process can make a 
huge difference in the practical performance. For these reasons, implemen- 
tations of particle filters in robotics tend to rely on mechanisms like the one 
just discussed. 

In general, the literature on efficient sampling is huge. Another popu- 
lar option is stratified sampling, in which particles are grouped into subsets. 
Sampling from these sets is performed in a two stage procedure. First, the 
number of samples drawn from each subset is determined based on the total 
weight of the particles contained in the subset. In the second stage, indi- 
vidual samples are drawn randomly from each subset using, for example, 
low variance resampling. Such a technique has lower sampling variance and 
tends to perform well when a robot tracks multiple, distinct hypotheses with 
a single particle filter. 
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Sampling Bias 


The fact that only finitely many particles are used also introduces a syste- 
matic bias in the posterior estimate. Consider the extreme case of M = 1 
particle. In this case, the loop in lines 3 through 7 in Table 4.3 will only be 
executed once, and 1; will contain only a single particle, sampled from the 
motion model. The key insight is that the resampling step (lines 8 through 
11 in Table 4.3) will now deterministically accept this sample, regardless of 
its importance factor wl". Hence the measurement probability p(z; | £?!) 
plays no role in the result of the update, and neither does z;. Thus, if M = 1, 
the particle filter generates particles from the probability p(z; | u14) instead 
of the desired posterior p(z, | u14, 21:4). It flatly ignores all measurements. 
How can this happen? 

The culprit is the normalization, implicit in the resampling step. When 
sampling in proportion to the importance weights (line 9 of the algorithm), 


wl) becomes its own normalizer if M = 1: 


p(draw oll inline9) = =1 





In general, the problem is that the non-normalized values w,[m] are drawn 
from an M-dimensional space, but after normalization they reside in a space 
of dimension M — 1. This is because after normalization, the m-th weight 
can be recovered from the M — 1 other weights by subtracting those from 
1. Fortunately, for larger values of M, the effect of loss of dimensionality, or 
degrees of freedom, becomes less and less pronounced. 


Particle Deprivation 


Even with a large number of particles, it may happen that there are no parti- 
cles in the vicinity of the correct state. This problem is known as the particle 
deprivation problem. It occurs mostly when the number of particles is too small 
to cover all relevant regions with high likelihood. However, one might argue 
that this ultimately must happen in any particle filter, regardless of the par- 
ticle set size M. 

Particle deprivation occurs as the result of the variance in random sam- 
pling; an unlucky series of random numbers can wipe out all particles near 
the true state. At each sampling step, the probability for this to happen is 
larger than zero (although it is usually exponentially small in M). Thus, we 
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only have to run the particle filter long enough. Eventually we will generate 
an estimate that is arbitrarily incorrect. 

In practice, problems of this nature only tend to arise when M is small 
relative to the space of all states with high likelihood. A popular solution to 
the particle deprivation problem is to add a small number of randomly gen- 
erated particles into the set after each resampling process, regardless of the 
actual sequence of motion and measurement commands. Such a methodol- 
ogy can reduce (but not fix) the deprivation problem, but at the expense of 
an incorrect posterior estimate. The advantage of adding random samples 
lies in its simplicity: The software modification necessary to add random 
samples in a particle filter is minimal. As a rule of thumb, adding random 
samples should be considered a measure of last resort, which should only be 
applied if all other techniques for fixing a deprivation problem have failed. 
Alternative approaches to dealing with particle deprivation will be discussed 
in Chapter 8, in the context of robot localization. 

This discussion showed that the quality of the sample based representation 
increases with the number of samples. An important question is therefore 
how many samples should be used for a specific estimation problem. Un- 
fortunately, there is no perfect answer to this question and it is often left to 
the user to determine the required number of samples. As a rule of thumb, 
the number of samples strongly depends on the dimensionality of the state 
space and the uncertainty of the distributions approximated by the particle 
filter. For example, uniform distributions require many more samples than 
distributions focused on a small region of the state space. A more detailed 
discussion on sample sizes will be given in the context of robot localization 
and mapping in future chapters of this book. 


Summary 


This section introduced two nonparametric Bayes filters, histogram filters 
and particle filters. Nonparametric filters approximate the posterior by a fi- 
nite number of values. Under mild assumptions on the system model and the 
shape of the posterior, both have the property that the approximation error 
converges uniformly to zero as the the number of values used to represent 
the posterior goes to infinity. 


* The histogram filter decomposes the state space into finitely many convex 
regions. It represents the cumulative posterior probability of each region 
by a single numerical value. 
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* There exist many decomposition techniques in robotics. In particular, the 
granularity of a decomposition may or may not depend on the structure of 
the environment. When it does, the resulting algorithms are often called 
‘topological.’ 


* Decomposition techniques can be divided into static and dynamic. Static 
decompositions are made in advance, irrespective of the shape of the be- 
lief. Dynamic decompositions rely on specifics of the robot's belief when 
decomposing the state space, often attempting to increase spatial resolu- 
tion in proportion to the posterior probability. Dynamic decompositions 
tend to give better results, but they are also more difficult to implement. 


e An alternative nonparametric technique is known as particle filter. Parti- 
cle filters represent posteriors by a random sample of states, drawn from 
the posterior. Such samples are called particles. Particle filters are ex- 
tremely easy to implement, and they are the most versatile of all Bayes 
filter algorithms represented in this book. 


e Specific strategies exist to reduce the error in particle filters. Among the 
most popular ones are techniques for reducing the variance of the esti- 
mate that arises from the randomness of the algorithm, and techniques 
for adapting the number of particles in accordance with the complexity of 
the posterior. 


The filter algorithms discussed in this and the previous chapter lay the 
groundwork for most probabilistic robotics algorithms discussed through- 
out the remainder of this book. The material presented here represents 
many of today's most popular algorithms and representations in probabilis- 
tic robotics. 


Bibliographical Remarks 


West and Harrison (1997) provides an in-depth treatment of several techniques discussed in this 
and the previous chapter. Histograms have been used in statistics for many decades. Sturges 
(1926) provides one of the early rules for selecting the resolution of a histogram approximation, 
and more recent treatment is by Freedman and Diaconis (1981). A contemporary analysis can 
be found in Scott (1992). Once a state space is mapped into a discrete histogram, the resulting 
temporal inference problem becomes an instance of a discrete Hidden Markov model, of the 
type made popular by Rabiner and Juang (1986). Two contemporary texts can be found in 
MacDonald and Zucchini (1997) and Elliott et al. (1995). 

Particle filters can be traced back to Metropolis and Ulam (1949), the inventors of Monte 
Carlo methods; see Rubinstein (1981) for a more contemporary introduction. The sampling 
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importance resampling technique, which is part of the particle filter, goes back to two seminal 
papers by Rubin (1988) and Smith and Gelfand (1992). Stratified sampling was first invented by 
Neyman (1934). In the past few years, particle filters have been studied extensively in the field 
of Bayesian statistics (Doucet 1998; Kitagawa 1996; Liu and Chen 1998; Pitt and Shephard 1999). 
In AI, particle filters were reinvented under the name survival of the fittest (Kanazawa et al. 1995); 
in computer vision, an algorithm called condensation by Isard and Blake (1998) applies them to 
tracking problems. A good contemporary text on particle filters is due to Doucet et al. (2001). 


Exercises 


1. In this exercise, you will be asked to implement a histogram filter for a 
linear dynamical system studied in the previous chapter. 


(a) Implement a histogram filter for the dynamical system described in 
Exercise 1 of the previous chapter (see page 81). Use the filter to predict 
a sequence of posterior distributions for t = 1,2,...,5. For each value 
of t, plot the joint posterior over x and à into a diagram, where x is the 
horizontal and z is the vertical axis. 


(b) Now implement the measurement update step into your histogram fil- 
ter, as described in Exercise 2 of the previous chapter (page 82). Sup- 
pose at time t = 5, we observe a measurement z = 5. State and plot the 
posterior before and after updating the histogram filter. 


2. You are now asked to implement the histogram filter for the nonlinear 
studied in Exercise 4 in the previous chapter (page 83). There, we stud- 
ied a nonlinear system defined over three state variables, and with the 
deterministic state transition 


: x + cos 
y = y t sin 
0' 0 


The initial state estimate was as follows: 


p = (00 0) and X = 0 001 0 
0 0 10000 


(a) Propose a suitable initial estimate for a histogram filter, which reflects 
the state of knowledge in the Gaussian prior. 
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(b) Implement a histogram filter and run its prediction step. Compare the 
resulting posterior with the one from the EKF and from your intuitive 
analysis. What can you learn about the resolution of the z-y coordi- 
nates and the orientation 0 in your histogram filter? 


(c 


— 


Now incorporate a measurement into your estimate. As before, the 
measurement shall be a noisy projection of the x-coordinate of the 
robot, with covariance Q — 0.01. Implement the step, compute the 
result, plot it, and compare it with the result of the EKF and your intu- 
itive drawing. 

Notice: When plotting the result of a histogram filter, you can show mul- 


tiple density plots, one for each discrete slice in the space of all 0-values. 


3. We talked about the effect of using a single particle. What is the effect 
of using M — 2 particles in particle filtering? Can you give an example 
where the posterior will be biased? If so, by what amount? 


4. Implement Exercise 1 using particle filters instead of histograms, and plot 
and discuss the results. 


5. Implement Exercise 2 using particle filters instead of histograms, and plot 
and discuss the results. Investigate the effect of varying numbers of par- 
ticles on the result. 
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Introduction 


This and the next chapter describe the two remaining components for im- 
plementing the filter algorithms described thus far: the motion and the mea- 
surement models. This chapter focuses on the motion model. Motion models 
comprise the state transition probability p(x; | we,2:-1), which plays an es- 
sential role in the prediction step of the Bayes filter. This chapter provides 
in-depth examples of probabilistic motion models as they are being used in 
actual robotics implementations. The subsequent chapter will describe prob- 
abilistic models of sensor measurements p(z; | x+), which are essential for the 
measurement update step. The material presented here will be essential for 
implementing any of the algorithms described in subsequent chapters. 

Robot kinematics, which is the central topic of this chapter, has been stud- 
ied thoroughly in past decades. However, it has almost exclusively been 
addressed in deterministic form. Probabilistic robotics generalizes kinematic 
equations to the fact that the outcome of a control is uncertain, due to control 
noise or unmodeled exogenous effects. Following the theme of this book, 
our description will be probabilistic: The outcome of a control will be de- 
scribed by a posterior probability. In doing so, the resulting models will be 
amenable to the probabilistic state estimation techniques described in the 
previous chapters. 

Our exposition focuses entirely on mobile robot kinematics for robots op- 
erating in planar environments. In this way, it is much more specific than 
most contemporary treatments of kinematics. No model of manipulator 
kinematics will be provided, neither will we discuss models of robot dy- 
namics. However, this restricted choice of material is by no means to be 
interpreted that probabilistic ideas are limited to simple kinematic models of 
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mobile robots. Rather, it is descriptive of the present state of the art, as prob- 
abilistic techniques have enjoyed their biggest successes in mobile robotics 
using relatively basic models of the types described in this chapter. The use 
of more sophisticated probabilistic models (e.g., probabilistic models of robot 
dynamics) remains largely unexplored in the literature. Such extensions, 
however, are not infeasible. As this chapter illustrates, deterministic robot 
actuator models are “probilified” by adding noise variables that characterize 
the types of uncertainty that exist in robotic actuation. 

In theory, the goal of a proper probabilistic model may appear to accu- 
rately model the specific types of uncertainty that exist in robot actuation 
and perception. In practice, the exact shape of the model often seems to be 
less important than the fact that some provisions for uncertain outcomes are 
provided in the first place. In fact, many of the models that have proven most 
successful in practical applications vastly overestimate the amount of uncer- 
tainty. By doing so, the resulting algorithms are more robust to violations of 
the Markov assumptions (Chapter 2.4.4), such as unmodeled state and the 
effect of algorithmic approximations. We will point out such findings in later 
chapters, when discussing actual implementations of probabilistic robotic al- 
gorithms. 


Preliminaries 


Kinematic Configuration 


Kinematics is the calculus describing the effect of control actions on the con- 
figuration of a robot. The configuration of a rigid mobile robot is commonly 
described by six variables, its three-dimensional Cartesian coordinates and 
its three Euler angles (roll, pitch, yaw) relative to an external coordinate 
frame. The material presented in this book is largely restricted to mobile 
robots operating in planar environments, whose kinematic state is summa- 
rized by three variables, referred to as pose in this text. 

The pose of a mobile robot operating in a plane is illustrated in Figure 5.1. 
It comprises its two-dimensional planar coordinates relative to an external 
coordinate frame, along with its angular orientation. Denoting the former as 
x and y (not to be confused with the state variable x+), and the latter by 0, the 
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<0.0> 
Figure 5.1 Robot pose, shown in a global coordinate system. 


pose of the robot is described by the following vector: 


ax 


y 
0 


The orientation of a robot is often called bearing, or heading direction. As 
shown in Figure 5.1, we postulate that a robot with orientation 0 — 0 points 
into the direction of its x-axis. A robot with orientation 0 = .57 points into 
the direction of its y-axis. 

Pose without orientation will be called location. The concept of location 
will be important in the next chapter, when we discuss measures to describe 
robot environments. For simplicity, locations in this book are usually de- 
scribed by two-dimensional vectors, which refer to the x-y coordinates of an 
object: 


x 
(z) 
The pose and the locations of objects in the environment may constitute the 
kinematic state Zi of the robot-environment system. 


Probabilistic Kinematics 


The probabilistic kinematic model, or motion model plays the role of the state 
transition model in mobile robotics. This model is the familiar conditional 
density 


p(z« | Ut, zia) 


Here x; and x,_; are both robot poses (and not just its z-coordinates), and 
ut is a motion command. This model describes the posterior distribution 
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(a) (b) 




















Figure 5.2 The motion model: Posterior distributions of the robot’s pose upon ex- 
ecuting the motion command illustrated by the solid line. The darker a location, 
the more likely it is. This plot has been projected into 2-D. The original density is 
three-dimensional, taking the robot’s heading direction 0 into account. 


over kinematic states that a robot assumes when executing the motion com- 
mand u at z;..1. In implementations, u; is sometimes provided by a robot's 
odometry. However, for conceptual reasons we will refer to u; as control. 

Figure 5.2 shows two examples that illustrate the kinematic model for a 
rigid mobile robot operating in a planar environment. In both cases, the 
robot's initial pose is z; ;. The distribution p(x; | ut, ;..1) is visualized 
by the shaded area: The darker a pose, the more likely it is. In this figure, 
the posterior pose probability is projected into x-y-space; the figure lacks a 
dimension corresponding to the robot's orientation. In Figure 5.2a, a robot 
moves forward some distance, during which it may accrue translational and 
rotational error as indicated. Figure 5.2b shows the resulting distribution 
of a more complicated motion command, which leads to a larger spread of 
uncertainty. 

This chapter provides in detail two specific probabilistic motion models 
p(z« | ut, 211), both for mobile robots operating in the plane. Both mod- 
els are somewhat complementary in the type of motion information that is 
being processed. The first assumes that the motion data u; specifies the veloc- 
ity commands given to the robot's motors. Many commercial mobile robots 
(e.g., differential drive, synchro drive) are actuated by independent transla- 
tional and rotational velocities, or are best thought of being actuated in this 
way. The second model assumes that one has access to odometry informa- 
tion. Most commercial bases provide odometry using kinematic information 
(distance traveled, angle turned). The resulting probabilistic model for inte- 
grating such information is somewhat different from the velocity model. 
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In practice, odometry models tend to be more accurate than velocity mod- 
els, for the simple reason that most commercial robots do not execute velocity 
commands with the level of accuracy that can be obtained by measuring the 
revolution of the robot’s wheels. However, odometry is only available after 
executing a motion command. Hence it cannot be used for motion planning. 
Planning algorithms such as collision avoidance have to predict the effects of 
motion. Thus, odometry models are usually applied for estimation, whereas 
velocity models are used for probabilistic motion planning. 


Velocity Motion Model 


The velocity motion model assumes that we can control a robot through two 
velocities, a rotational and a translational velocity. Many commercial robots 
offer control interfaces where the programmer specifies velocities. Drive 
trains commonly controlled in this way include differential drives, Acker- 
man drives, and synchro-drives. Drive systems not covered by our model 
are those without non-holonomic constraints, such as robots equipped with 
Mecanum wheels or legged robots. 

We will denote the translational velocity at time t by v+, and the rotational 
velocity by w+. Hence, we have 


Ut 
Ut 一 
Ut 


We arbitrarily postulate that positive rotational velocities w, induce a coun- 
terclockwise rotation (left turns). Positive translational velocities v, corre- 
spond to forward motion. 


Closed Form Calculation 


A possible algorithm for computing the probability p(x; | uz, x11) is shown 
in Table 5.1. It accepts as input an initial pose z;.1 = (v y 0)", a control 
ut = (v w)", and a hypothesized successor pose zi = (2’ y' 0')T. It out- 
puts the probability p(a; | ut, x+—1) of being at x, after executing control u: 
beginning in state z,..,, assuming that the control is carried out for the fixed 
duration At. The parameters al to og are robot-specific motion error param- 
eters. The algorithm in Table 5.1 first calculates the controls of an error-free 
robot; the meaning of the individual variables in this calculation will become 
more apparent below, when we derive it. These parameters are given by 6 
and à. 
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(a) (b) (c) 





Figure 5.3 The velocity motion model, for different noise parameter settings. 


The function prob(z, b?) models the motion error. It computes the proba- 
bility of its parameter x under a zero-centered random variable with variance 
b?. Two possible implementations are shown in Table 5.2, for error variables 
with normal distribution and triangular distribution, respectively. 

Figure 5.3 shows graphical examples of the velocity motion model, pro- 
jected into x-y-space. In all three cases, the robot sets the same translational 
and angular velocity. Figure 5.3a shows the resulting distribution with mod- 
erate error parameters a; to ag. The distribution shown in Figure 5.3b is 
obtained with smaller angular error (parameters as and o4) but larger trans- 
lational error (parameters o, and a2). Figure 5.3c shows the distribution 
under large angular and small translational error. 


Sampling Algorithm 


For particle filters (c.f. Chapter 4.3), it suffices to sample from the motion 
model p(x; | wz, 1,1), instead of computing the posterior for arbitrary z;, ut 
and z;,. Sampling from a conditional density is different than calculating 
the density: In sampling, one is given u, and r;., and seeks to generate 
a random z, drawn according to the motion model p(x; | ut, z; 1). When 
calculating the density, one is also given x; generated through other means, 
and one seeks to compute the probability of x; under p(z; | ui, 241). 

The algorithm sample motion model velocity in Table 5.3 generates ran- 
dom samples from p(x; | ui, 24.1) for a fixed control w and pose z,. 4. It 
accepts x, and uw; as input and generates a random pose x, according to 
the distribution p(x | ut, z; 1). Line 2 through 4 “perturb” the commanded 
control parameters by noise, drawn from the error parameters of the kine- 
matic motion model. The noise values are then used to generate the sample's 
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1: Algorithm motion_model_velocity(x;, ut, x;..1): 
2: _ 1 (x — z')cos0 + (y — y') sin0 
| —.2 (y — y) cos — (x — x") sin 0 
. THL 
3: gc ny) 
+4 F 
| y = 2 + ue! -2) 
5: r* = V(z -a*y + (y—y*)? 
6: A0 = atan2(y' — y*, 2’ — z*) — atan2(y — y*, x — x“) 
A0 
7 0-—-——r* 
MOX 
~ A0 
8 Ww = At 
9 y= Kf -Ô 
10: return prob(v — $,04v? + agw?) - prob(w — &, agv? + asw?) 
: prob(4, ov? + asw?) 








Table 5.1 Algorithm for computing p(x: | ut, zt-1) based on velocity information. 
Here we assume zi_l is represented by the vector (2 y 0)"; zi is represented by 
(x! y' 0')'; and uz is represented by the velocity vector (v w)”. The function 
prob(a, b?) computes the probability of its argument a under a zero-centered dis- 


tribution with variance b°. It may be implemented using any of the algorithms in 
Table 5.2. 





1: Algorithm prob normal distribution(a, 52): 


2: return 





1 1a? 
V 2r b? TN xps 
3: Algorithm prob triangular distribution(a, b°): 


4: return max | >= 一 ss] 











Table 5.2 Algorithms for computing densities of a zero-centered normal distribu- 
tion and a triangular distribution with variance b°. 
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1: Algorithm sample motion model velocity(u;, x;—1): 
2: 0 = v 十 sample(alv2 + agw?) 

3: Q = w +sample(a3v? + agw?) 

A: ^ = sample(asv? + agw?) 

5: a’ —z — Psin0- Ê sin(0 + GA) 

6: y' = y + $ cos0 — 2 cos(0 + GAt) 

7: 0 —0--OAt t SAC 

8: return x; = (z',y',0')* 











Table 5.3 Algorithm for sampling poses zt = (x’ y' 0')' from a pose zii = 


(x y 0)" anda control u; = (v w)”. Note that we are perturbing the final orientation 
by an additional random term, 4. The variables a; through as are the parameters of 
the motion noise. The function sample(»^) generates a random sample from a zero- 
centered distribution with variance b?. It may, for example, be implemented using 
the algorithms in Table 5.4. 





1: Algorithm sample normal distribution(»?): 
12 

2: return 5 3 rand(—b, b) 

3: Algorithm sample triangular distribution(b?): 
v6 

4: return r3 [rand(—b, b) + rand(—5, b)| 











Table 5.4 Algorithm for sampling from (approximate) normal and triangular dis- 
tributions with zero mean and variance b?; see Winkler (1995: p293). The function 
rand(z, y) is assumed to be a pseudo random number generator with uniform distri- 
bution in [z, y]. 
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(a) (b) 





Figure 5.4 Sampling from the velocity motion model, using the same parameters as 
in Figure 5.3. Each diagram shows 500 samples. 


new pose, in lines 5 through 7. Thus, the sampling procedure implements a 
simple physical robot motion model that incorporates control noise in its pre- 
diction, in just about the most straightforward way. Figure 5.4 illustrates the 
outcome of this sampling routine. It depicts 500 samples generated by sam- 
ple_motion_model_velocity. The reader might want to compare this figure 
with the density depicted in Figure 5.3. 

We note that in many cases, it is easier to sample x; than calculate the den- 
sity of a given x+. This is because samples require only a forward simulation 
of the physical motion model. To compute the probability of a hypothetical 
pose amounts to retro-guessing of the error parameters, which requires us 
to calculate the inverse of the physical motion model. The fact that particle 
filters rely on sampling makes them specifically attractive from an imple- 
mentation point of view. 





Mathematical Derivation of the Velocity Motion Model 


We will now derive the algorithms motion_model_velocity and sam- 
ple_motion_model_velocity. As usual, the reader not interested in the 
mathematical details is invited to skip this section at first reading, and con- 
tinue in Chapter 5.4 (page 132). The derivation begins with a generative 
model of robot motion, and then derives formulae for sampling and com- 
puting p(x | ut, 24-1) for arbitrary z+, uz, and x1. 





Exact Motion 
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<Xcyc> x 


Figure 5.5 Motion carried out by a noise-free robot moving with constant velocities 
v and w and starting at (x y 0)". 


Before turning to the probabilistic case, let us begin by stating the kinematics 
for an ideal, noise-free robot. Let w = (v w)7 denote the control at time t. If 
both velocities are kept at a fixed value for the entire time interval (t — 1, t], 
the robot moves on a circle with radius 


r = 








This follows from the general relationship between the translational and ro- 
tational velocities v and w for an arbitrary object moving on a circular trajec- 
tory with radius r: 


Uu-—u-r 


Equation (5.5) encompasses the case where the robot does not turn at all (i.e., 
w — 0), in which case the robot moves on a straight line. A straight line 
corresponds to a circle with infinite radius, hence we note that r may be 
infinite. 

Let x1 = (v, y, 0)? be the initial pose of the robot, and suppose we keep 
the velocity constant at (v w)! for some time At. As one easily shows, the 
center of the circle is at 


qu. = z — — sinó 
w 
U 

Ye = 4 十 一 cos0 
w 
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The variables (x. ye)” denote this coordinate. After At time of motion, our 
ideal robot will be at x; = (z^, y', 6’)? with 





ax! ze + sin(O+wAt) 
y = Uc — 2 cos(0 + wAt) 
6' 0 十 wAL 
x —2 sind + 2? sin(0 + wAt) 
= y |+ Z cos — 7 cos(0 + wAt) 
0 wAt 


The derivation of this expression follows from simple trigonometry: After 
At units of time, the noise-free robot has progressed v - At along the circle, 
which caused its heading direction to turn by w- At. At the same time, its x 
and y coordinate is given by the intersection of the circle about (xe ye)”, and 
the ray starting at (xe ye)” at the angle perpendicular to w - At. The second 
transformation simply substitutes (5.8) into the resulting motion equations. 

Of course, real robots cannot jump from one velocity to another, and keep 
velocity constant in each time interval. To compute the kinematics with non- 
constant velocities, it is therefore common practice to use small values for 
At, and to approximate the actual velocity by a constant within each time 
interval. The (approximate) final pose is then obtained by concatenating 
the corresponding cyclic trajectories using the mathematical equations just 
stated. 


Real Motion 

In reality, robot motion is subject to noise. The actual velocities differ from 
the commanded ones (or measured ones, if the robot possesses a sensor for 
measuring velocity). We will model this difference by a zero-centered ran- 
dom variable with finite variance. More precisely, let us assume the actual 
velocities are given by 


Oo) a 

w Easau2 十 ad2 
Here £j» is a zero-mean error variable with variance b?. Thus, the true veloc- 
ity equals the commanded velocity plus some small, additive error (noise). 
In our model, the standard deviation of the error is proportional to the com- 
manded velocity. The parameters al to o4 (with o; > 0 for i = 1,...,4) are 


robot-specific error parameters. They model the accuracy of the robot. The 
less accurate a robot, the larger these parameters. 


Ss 





&> 
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(5.11) 


TRIANGULAR 
DISTRIBUTION 


(5.12) 


(5.13) 
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(a) (b) 











-b b -b b 


Figure 5.6 Probability density functions with variance b*: (a) Normal distribution, 
(b) triangular distribution. 


Two common choices for the error ej» are the normal and the triangular 
distribution. 

The normal distribution with zero mean and variance b? is given by the den- 
sity function 





1 a2 
&(a) = NETT e 3s 


Figure 5.6a shows the density function of a normal distribution with variance 
b?. Normal distributions are commonly used to model noise in continuous 
stochastic processes. Its support, which is the set of points a with p(a) > 0, is 
R. 

The density of a triangular distribution with zero mean and variance b? is 
given by 


(i) = mado. - d 
€w(a) = max <0, Jeb Oe 
which is non-zero only in (—V6b; 6b). As Figure 5.6b suggests, the density 


resembles the shape of a symmetric triangle—hence the name. 
A better model of the actual pose x; = (a’ y' 0')T after executing the 





motion command us = (v w)T atz,., = (x y 6)" is thus 
a! x —2 sin0 + Ê sin(0 + GA) 
y = y |+ 2 cos0 — 2 cos(0 + GA) 


6” 0 At 
This equation is obtained by substituting the commanded velocity u, = 


(v w)? with the noisy motion (ô )” in (5.9). However, this model is still not 
very realistic, for reasons discussed in turn. 
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Final Orientation 


The two equations given above exactly describe the final location of the robot 
given that the robot actually moves on an exact circular trajectory with ra- 
dius r = 2. While the radius of this circular segment and the distance 
traveled is influenced by the control noise, the very fact that the trajectory 
is circular is not. The assumption of circular motion leads to an important 
degeneracy. In particular, the support of the density p(x; | ut, x11) is two- 
dimensional, within a three-dimensional embedding pose space. The fact 
that all posterior poses are located on a two-dimensional manifold within 
the three-dimensional pose space is a direct consequence of the fact that we 
used only two noise variables, one for v and one for v. Unfortunately, this 
degeneracy has important ramifications when applying Bayes filters for state 
estimation. 

In reality, any meaningful posterior distribution is of course not degener- 
ate, and poses can be found within a three-dimensional space of variations 
in x, y, and 0. To generalize our motion model accordingly, we will assume 
that the robot performs a rotation 7 when it arrives at its final pose. Thus, 
instead of computing 6’ according to (5.13), we model the final orientation 


by 


0 = 04+HAt+%4At 
with 
了 = Ea5u2 十 a6w2 


Here as and ag are additional robot-specific parameters that determine the 
variance of the additional rotational noise. Thus, the resulting motion model 
is as follows: 





a! x 2sind + Ê sin(0 + At) 
y = y | + | Scos@— 2 cos(0 + wWAt) 
o 0 DAt + At 


Computation of p(£+ | Ut, $41) 


The algorithm motion_model_velocity in Table 5.1 implements the compu- 
tation of p(x; | ut, £+—1) for given values of zt 1 = (x y 0)7, u = (v w)*, 
and x; = (z' y' 0')T. The derivation of this algorithm is somewhat involved, 
as it effectively implements an inverse motion model. In particular, mo- 
tion model velocity determines motion parameters à, = (6 &)T from the 
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(5.17) 


(5.18) 


(5.19) 


(5.20) 


(5.21) 
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poses x;_; and x;, along with an appropriate final rotation ^. Our derivation 
makes it obvious as to why a final rotation is needed: For almost all values 
of z; 1, Uz, and Zi the motion probability would simply be zero without 
allowing for a final rotation. 

Let us calculate the probability p(x; | ut, Zit_1) of control action u; = (v w) 
carrying the robot from the pose z, 1 = (x y @)* to the pose x, = (2’ y! 6’)? 
within At time units. To do so, we will first determine the control à = (6 &)T 
required to carry the robot from z, 4 to position (z' y^), regardless of the 
robot's final orientation. Subsequently, we will determine the final rotation 
^j necessary for the robot to attain the orientation 0’. Based on these calcula- 
tions, we can then easily calculate the desired probability p(x; | ut, 1.1). 

The reader may recall that our model assumes that the robot travels with 
a fixed velocity during At, resulting in a circular trajectory. For a robot that 
moved from z,., = (x y 0)" to a, = (z' y)T, the center of the circle is 
defined as (z* y*)7 and given by 


( a ) z ( x )+( —\sin 0 z | aie + u(y —y') 
y* y A cos Ü I5 + u(x — x) 
for some unknown A, u € R. The first equality is the result of the fact that 
the circle’s center is orthogonal to the initial heading direction of the robot; 
the second is a straightforward constraint that the center of the circle lies on 
a ray that lies on the half-way point between (x y)” and (z/ y')' and is 
orthogonal to the line between these coordinates. 

Usually, Equation (5.17) has a unique solution—except in the degenerate 


case of w — 0, in which the center of the circle lies at infinity. As the reader 
might want to verify, the solution is given by 


T 








1 (r—2)cos0-F(y-— ysinO 
2 (y— y')cos0 — (a — z')sin0 




















and hence 
a —a’) cos 0+(y—y’) sin 6 
( g* ) (9-58 Sere (UV) 
* a aba)! op! OL (ar y sine 
y CM Ei epee ees (2! — 2) 


The radius of the circle is now given by the Euclidean distance 
r= VE TO = VW- rP y 


Furthermore, we can now calculate the change of heading direction 








A0 = atan2(y' — y*, x" — z*) — atan2(y — y*, xz — x^) 
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(5.27) 
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(5.29) 
(5.30) 
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Here atan2 is the common extension of the arcus tangens of y/x extended 
to the R? (most programming languages provide an implementation of this 
function): 


atan(y/x) ifs >0 
sign 7 — atan(|y/x ifr«0 

uiui us. = Rd (y) ( (ly/xl)) Eu 
sign(y) 7/2 ifr—0,y 40 


Since we assume that the robot follows a circular trajectory, the translational 
distance between x; and z,. ; along this circle is 


Adist = r* - A 


From Adist and AQ, it is now easy to compute the velocities ó and à: 


, = Adist 
a= (2) se (484) 


The rotational velocity ^ needed to achieve the final heading 6’ of the robot 
in (z'y') within At can be determined according to (5.14) as: 


$ 


&> 


5 = At (9 —0)-à 


The motion error is the deviation of à; and ^ from the commanded velocity 
uz = (v w)T and y = 0, as defined in Equations (5.24) and (5.25). 


Uer = U—V 
Wer = W— ®© 
Yer = Y 


Under our error model, specified in Equations (5.10), and (5.15), these errors 
have the following probabilities: 
Eayv2+aqw2 (Verr) 
Easu2 十 ad2 (Werr) 
Easv2+agw2 (Yer) 


where so denotes a zero-mean error variable with variance b?, as before. 
Since we assume independence between the different sources of error, the 
desired probability p(x; | ut, z;.1) is the product of these individual errors: 


p(zi | Ut, 243) = Eq v2+aqw? (Verr) * Ea3su2 十 ad4w2 (werr) ` Easv2+agw? (Yerr) 
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To see the correctness of the algorithm motion_model_velocity in Table 5.1, 
the reader may notice that this algorithm implements this expression. More 
specifically, lines 2 to 9 are equivalent to Equations (5.18), (5.19), (5.20), (5.21), 
(5.24), and (5.25). Line 10 implements (5.32), substituting the error terms as 
specified in Equations (5.29) to (5.31). 


Sampling from p(x’ | u, x) 


The sampling algorithm sample_motion_model_velocity in Table 5.3 imple- 
ments a forward model, as discussed earlier in this section. Lines 5 through 7 
correspond to Equation (5.16). The noisy values calculated in lines 2 through 
4 correspond to Equations (5.10) and (5.15). 

The algorithm sample_normal_distribution in Table 5.4 implements a 
common approximation to sampling from a normal distribution. This ap- 
proximation exploits the central limit theorem, which states that any av- 
erage of non-degenerate random variables converges to a normal distribu- 
tion. By averaging 12 uniform distributions, sample_normal_distribution 
generates values that are approximately normal distributed; though 
technically the resulting values lie always in [—20,2b]. Finally, sam- 
ple triangular distribution in Table 5.4 implements a sampler for triangular 
distributions. 





Odometry Motion Model 


The velocity motion model discussed thus far uses the robot's velocity to 
compute posteriors over poses. Alternatively, one might want to use the 
odometry measurements as the basis for calculating the robot's motion over 
time. Odometry is commonly obtained by integrating wheel encoder infor- 
mation; most commercial robots make such integrated pose estimation avail- 
able in periodic time intervals (e.g., every tenth of a second). This leads to 
a second motion model discussed in this chapter, the odometry motion model. 
The odometry motion model uses odometry measurements in lieu of con- 
trols. 

Practical experience suggests that odometry, while still erroneous, is usu- 
ally more accurate than velocity. Both suffer from drift and slippage, but 
velocity additionally suffers from the mismatch between the actual motion 
controllers and its (crude) mathematical model. However, odometry is only 
available in retrospect, after the robot moved. This poses no problem for fil- 
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Srot2 






Otrans 


Figure 5.7 Odometry model: The robot motion in the time interval (t — 1, ¢] is ap- 
proximated by a rotation ôrot1, followed by a translation 0trans and a second rotation 
ôrot2. The turns and translations are noisy. 


ter algorithms, such as the localization and mapping algorithms discussed 
in later chapters. But it makes this information unusable for accurate motion 
planning and control. 


Closed Form Calculation 


Technically, odometric information are sensor measurements, not controls. 
To model odometry as measurements, the resulting Bayes filter would have 
to include the actual velocity as state variables—which increases the dimen- 
sion of the state space. To keep the state space small, it is therefore common 
to consider odometry data as if it were control signals. In this section, we 
will treat odometry measurements just like controls. The resulting model is 
at the core of many of today's best probabilistic robot systems. 

Let us define the format of our control information. At time t, the correct 
pose of the robot is modeled by the random variable z;. The robot odome- 
try estimates this pose; however, due to drift and slippage there is no fixed 
coordinate transformation between the coordinates used by the robot's in- 
ternal odometry and the physical world coordinates. In fact, knowing this 
transformation would solve the robot localization problem! 

The odometry model uses the relative motion information, as measured by 
the robot's internal odometry. More specifically, in the time interval (t — 1, t], 
the robot advances from a pose x;_1 to pose x+. The odometry reports back 
to us a related advance from z, , = (Z y 0)" to z, = (a^ y' 0')'. Here the 
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A 


Algorithm motion model odometry(z;, ut, x;—1): 


Orotl = atan2(y' — g,z' — z) — 0 


Otrans = V e Bye RE (y = yy 
0iot2 = 6! = 0 = Óroti 





Órot1 = atan2(y' — y,a! — x) — 0 
办 P V (x nd s T (y m y’)? 
Lm -0g-0— Êrot1 





IS 52 52 
P1 = prob(6rot1 = Srot1» 0185 EF O20t ans) 


5 A2 A2 $2 
pa = Prob(btrans m Otrans， Q30frans + a4 orot1 + 0407512) 





10: pa = prob(ó;oto = Oot2， a1 02449 EN a20Zans) 


11: return pı : p2 * pa 











Table5.5 Algorithm for computing p(x: | ut, zi-1) based on odometry information. 


m ou 


Here the control uz is given by (zi1 z+)”, with zi = (z y 0) and z = (a y' 6’). 


bar indicates that these are odometry measurements embedded in a robot- 
internal coordinate whose relation to the global world coordinates is un- 
known. The key insight for utilizing this information in state estimation is 
that the relative difference between z,..; and 7+, under an appropriate defi- 
nition of the term "difference," is a good estimator for the difference of the 
true poses z,..; and z;. The motion information u; is, thus, given by the pair 


Tii 
Ut = = 
Tt 


To extract relative odometry, u+ is transformed into a sequence of three steps: 
a rotation, followed by a straight line motion (translation), and another ro- 
tation. Figure 5.7 illustrates this decomposition: the initial turn is called 
Oroti, the translation dtrans, and the second rotation 6, 42. As the reader 
easily verifies, each pair of positions (5 5') has a unique parameter vector 
(Óroti trans Srot2)’, and these parameters are sufficient to reconstruct the 
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(a) (b) (c) 


Figure 5.8 The odometry motion model, for different noise parameter settings. 


relative motion between s and 8’. Thus, drot1, trans, Srot2 form together a suf- 
ficient statistics of the relative motion encoded by the odometry. 

The probabilistic motion model assumes that these three parameters are 
corrupted by independent noise. The reader may note that odometry motion 
uses one more parameter than the velocity vector defined in the previous 
section, for which reason we will not face the same degeneracy that led to 
the definition of a “final rotation.” 

Before delving into mathematical detail, let us state the basic algorithm 
for calculating this density in closed form. Table 5.5 depicts the algorithm 
for computing p(x; | ur, 1-1) from odometry. This algorithm accepts as an 
input an initial pose z;_1, a pair of poses u; = (z, 1 Z+)7 obtained from the 
robot's odometry, and a hypothesized final pose x+. It outputs the numerical 
probability p(x, | ut, 4-1). 

Lines 2 to 4 in Table 5.5 recover relative motion parameters 
(Orot1 Otrans brotz)7 from the odometry readings. As before, they implement 
an inverse motion model. The corresponding relative motion parameters 
(fista Didis Orta). for the given poses z,.. and 2; are calculated in lines 5 
through 7 of this algorithm. Lines 8 to 10 compute the error probabilities for 
the individual motion parameters. As above, the function prob(a, b?) imple- 
ments an error distribution over a with zero mean and variance b?. Here the 
implementer must observe that all angular differences must lie in [—7, 7]. 
Hence the outcome of ó,4,4 — drot2 has to be truncated correspondingly—a 
common error that tends to be difficult to debug. Finally, line 11 returns the 
combined error probability, obtained by multiplying the individual error 
probabilities pı, p», and ps. This last step assumes independence between 
the different error sources. The variables a; through a4 are robot-specific 
parameters that specify the noise in robot motion. 
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A 


Algorithm sample motion model odometry(u;, v, 1): 





roti = atan2(g' — g,2/ — =) — 0 
Otrans = V e Bye RE (y = yy 
Srot2 = 6! = 0 nu Óroti 





S 2 2 
Óroti m Orotl = sample(aigiotl E a20trans) 


S et 2 2 2 
Ótrans = Ótrans 一 sample(aóz i. + 04 foti + 4 0:42) 





S 2 2 
Srot2 oa Srot2 IE sample(o167,» T a20trans) 


vw 三 外 十 Dis cos(6 + cin) 
. y 二 2 下 aas sin(0 十 Ôrot1) 
10: 0 = 6 + roi + drot2 





11: return x; = (2',y’,0’)7 











Table 5.6 Algorithm for sampling from p(x: | ui, x:-1) based on odometry informa- 
tion. Here the pose at time t is represented by 2:1 = (x y 0)". The control is a differ- 
entiable set of two pose estimates obtained by the robot's odometer, u: = (Z1-1 Zt)", 


m on 


with zi L1 = (Z y 0)and zy = (Z y’ 6’). 


Figure 5.8 shows examples of our odometry motion model for different 
values of the error parameters al to o4. The distribution in Figure 5.8a is a 
typical one, whereas the ones shown in Figures 5.8b and 5.8c indicate un- 
usually large translational and rotational errors, respectively. The reader 
may want to carefully compare these diagrams with those in Figure 5.3 on 
page 122. The smaller the time between two consecutive measurements, the 
more similar those different motion models. Thus, if the belief is updated 
frequently e.g., every tenth of a second for a conventional indoor robot, the 
difference between these motion models is not very significant. 
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(a) 





Figure 5.9 Sampling from the odometry motion model, using the same parameters 
as in Figure 5.8. Each diagram shows 500 samples. 


Sampling Algorithm 


If particle filters are used for localization, we would also like to have an al- 
gorithm for sampling from p(z, | ut, x;..1). Recall that particle filters (Chap- 
ter 4.3) require samples of p(x; | uz, 24-1), rather than a closed-form expres- 
sion for computing p(x; | ut, Zit_1) for any v1, wz, and z,. The algorithm 
sample_motion_model_odometry, shown in Table 5.6, implements the sam- 
pling approach. It accepts an initial pose x,_; and an odometry reading us 
as input, and outputs a random 2; distributed according to p(x, | ui, 24.1). 
It differs from the previous algorithm in that it randomly guesses a pose z; 
(lines 5-10), instead of computing the probability of a given z,. As before, 
the sampling algorithm sample motion model odometry is somewhat eas- 
ier to implement than the closed-form algorithm motion model odometry, 
since it side-steps the need for an inverse model. 

Figure 5.9 shows examples of sample sets generated by sam- 
ple motion model odometry, using the same parameters as in the model 
shown in Figure 5.8. Figure 5.10 illustrates the motion model "in action" 
by superimposing sample sets from multiple time steps. This data has been 
generated using the motion update equations of the algorithm particle filter 
(Table 4.3), assuming the robot's odometry follows the path indicated by 
the solid line. The figure illustrates how the uncertainty grows as the robot 
moves. The samples are spread across an increasingly large space. 











Mathematical Derivation of the Odometry Motion Model 


The derivation of the algorithms is relatively straightforward, and once again 
may be skipped at first reading. To derive a probabilistic motion model using 
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(5.34) 
(5.35) 
(5.36) 
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Start location 


T 





10 meters 


Figure 5.10 Sampling approximation of the position belief for a non-sensing robot. 
The solid line displays the actions, and the samples represent the robot's belief at 
different points in time. 


odometry, we recall that the relative difference between any two poses is rep- 
resented by a concatenation of three basic motions: a rotation, a straight-line 
motion (translation), and another rotation. The following equations show 
how to calculate the values of the two rotations and the translation from the 


odometry reading us = (元 -1 2,)7, with z,., = (z y 0) and z; = (z' y' 0): 





Orot) = atan2(y —g,z/—z)—8 
Otrans = V = pl)? + (7 "E y)? 
Órot2 一 6 一 0 z 0iotl 


To model the motion error, we assume that the “true” values of the rotation 
and translation are obtained from the measured ones by subtracting inde- 
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pendent noise £5» with zero mean and variance b: 


ÜÓroti 一 Orotl m Ead? e +0262, 
Otrans a Otrans — Eag Sans taa O2, 41 Haa 622 
Jrota = Órot2 — Ead? y taaan 





As in the previous section, €,2 is a zero-mean noise variable with variance p. 
The parameters al to o4 are robot-specific error parameters, which specify 
the error accrued with motion. 

Consequently, the true position, x+, is obtained from z+—1 by an initial rota- 
tion with angle Ost; followed by a translation with distance ee followed 
by another rotation with angle ôrot2. Thus, 





z' Tz cans cos(0 EIS Ôrot1) 
y = y T Otrans sin(0 uS Órot1) 
0' 0 Óroti ES Srot2 


Notice that algorithm sample_motion_model_odometry implements Equa- 
tions (5.34) through (5.40). 

The algorithm motion_model_odometry is obtained by noticing that lines 
5-7 compute the motion parameters Srotty Ôtrans, and ôrot2 for the hypothe- 
sized pose z;, relative to the initial pose Zt_1. The difference of both, 





Orotl e Óroti 


Otrans V Otrans 


drot2 = Órot2 


is the error in odometry, assuming of course that x; is the true final pose. The 
error model (5.37) to (5.39) implies that the probability of these errors is given 


by 


Di = 85al62 十 ao62 (biotl — Ôrot1) 


rot1 | trans 


pa = Eas 62 +a4 02 (1-04 025 (Ótzans m Ótrans) 


trans 


(Orot2 = Órot2) 





P3 = 5al62 54-302 


trans 


with the distributions £ defined as above. These probabilities are computed 
in lines 8-10 of our algorithm motion model odometry, and since the errors 
are assumed to be independent, the joint error probability is the product pi : 
p2- ps (c.f., line 11). 
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Motion and Maps 


By considering p(x; | we, 21-1), we defined robot motion in a vacuum. In par- 
ticular, this model describes robot motion in the absence of any knowledge 
about the nature of the environment. In many cases, we are also given a 
map m, which may contain information pertaining to the places that a robot 
may or may not be able to navigate. For example, occupancy maps, which 
will be explained in Chapter 9, distinguish free (traversable) from occupied 
terrain. The robot’s pose must always be in the free space. Therefore, know- 
ing m gives us further information about the robot pose x; before, during, 
and after executing a control u;. 

This consideration calls for a motion model that takes the map m into ac- 
count. We will denote this model by p(x; | ui, x«1, m), indicating that it 
considers the map m in addition to the standard variables. If m carries infor- 
mation relevant to pose estimation, we have 


plar | Ut, £t-1) Æ plx | Ut, Ze-1, M) 


The motion model p(x, | ut, 4-1, m) should give better results than the map- 
free motion model p(x, | ut, x; .1). We will refer to p(x, | Ut, z; 1, m) as map- 
based motion model. The map-based motion model computes the likelihood 
that a robot placed in a world with map m arrives at pose x, upon execut- 
ing action u, at pose x;_,. Unfortunately, computing this motion model in 
closed form is difficult. This is because to compute the likelihood of being 
at x, after executing action u;, one has to incorporate the probability that an 
unoccupied path exists between x;_; and x, and that the robot might have 
followed this unoccupied path when executing the control u;—a complex 
operation. 

Luckily, there exists an efficient approximation for the map-based motion 
model, which works well if the distance between z+—1 and Zi is small (e.g., 
smaller than half a robot diameter). The approximation factorizes the map- 
based motion model into two components: 


P(T | uc, 2:1) p(t | m) 
p(z) 





p(zi | Ut, 4—1; M) = 


where ņ is the usual normalizer. Usually, p(z;) is also uniform and can be 
subsumed into the constant normalizer. One then simply multiplies the map- 
free estimate p(x; | ut, 21-1) with a second term, p(z,; | m), which expresses 
the “consistency” of pose x; with the map m. In the case of occupancy maps, 
p(z; | m) = 0 if and only if the robot would be placed in an occupied grid cell 
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Algorithm motion model with map(r,;, ut, x; 1, m): 
2: return p(x | us zii) + p(t | m) 
1: Algorithm sample motion model with map(u;, x1, m): 
2: do 
3: x, = sample motion model(u,, x; 1) 
3: T = p(t: | m) 
4: until 1 > 0 
5: return (z+, 7) 





Table 5.7 Algorithm for computing p(x: | ut, xi—1, m), which utilizes a map m of 
the environment. This algorithms bootstraps previous motion models (Tables 5.1, 5.3, 
5.5, and 5.6) to models that take into account that robots cannot be placed in occupied 
space in the map m. 


in the map; otherwise it assumes a constant value. By multiplying p(x; | m) 
and p(x; | uz, 24 1), we obtain a distribution that assigns all probability mass 
to poses x, consistent with the map, which otherwise has the same shape as 
p(z« | ut, 2151). As m can be computed by normalization, this approxima- 
tion of a map-based motion model can be computed efficiently without any 
significant overhead compared to a map-free motion model. 

Table 5.7 states the basic algorithms for computing and for sampling from 
the map-based motion model. Notice that the sampling algorithm returns 
a weighted sample, which includes an importance factor proportional to 
p(z« | m). Care has to be taken in the implementation of the sample version, 
to ensure termination of the inner loop. An example of the motion model 
is illustrated in Figure 5.11. The density in Figure 5.11a is p(x; | ui, zi 1), 
computed according to the velocity motion model. Now suppose the map m 
possesses a long rectangular obstacle, as indicated in Figure 5.11b. The prob- 
ability p(x; | m) is zero at all poses x; where the robot would intersect the 
obstacle. Since our example robot is circular, this region is equivalent to the 
obstacle grown by a robot radius—this is equivalent to mapping the obsta- 
cle from workspace to the robot's configuration space or pose space. The resulting 
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(a) p(z« | ut, zi—1) (b) p(z« | ut, ve—1,m) 
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Figure 5.11 Velocity motion model (a) without a map and (b) conditioned on a map 
m. 


probability p(z; | uz, 4-1, m), shown in Figure 5.11b, is the normalized prod- 
uct of p(x; | m) and p(x; | ui, 74-1). It is zero in the extended obstacle area, 
and proportional to p(z, | ut, 1-1) everywhere else. 

Figure 5.11 also illustrates a problem with our approximation. The region 
marked (x) possesses non-zero likelihood, since both p(x; | ut, 24.1) and 
p(z; | m) are non-zero in this region. However, for the robot to be in this 
particular area it must have gone through the wall, which is impossible in 
the real world. This error is the result of checking model consistency at the 
final pose x; only, instead of verifying the consistency of the robot's path 
to the goal. In practice, however, such errors only occur for relatively large 
motions u;, and it can be neglected for higher update frequencies. 

To shed light onto the nature of the approximation, let us briefly derive it. 
Equation (5.48) can be obtained by applying Bayes rule: 


plar | ui zi 51,m). = NPM | zi Ut, 2i) pi | Ut, mii) 


If we approximate p(m | zi, ui, 21.1) by p(m | zi) and observe that p(m) is a 
constant relative to the desired posterior, we obtain the desired equation as 
follows: 


I 


n p(m | zi) p(zi | Ut, £t—1) 
plz: | m) p(m) 


p(zi | Ut, Lt-1,™) 





= n =n plr | ut, 24-1) 
plz) 
a p(z: | m) p(zi | Ut, i1) 
=N 
p(zi) 
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Here 7 is the normalizer (notice that the value of ņ is different for the different 
steps in our transformation). This brief analysis shows that our map-based 
model is justified under the rough assumption that 


p(m | Lt, Ut, Le 1) = p(m | z) 


Obviously, these expressions are not equal. When computing the conditional 
over m, our approximation omits two terms: u; and x;_,. By omitting these 
terms, we discard any information relating to the robot's path leading up 
to z;. All we know is that its final pose is z,. We already noticed the conse- 
quences of this omission in our example above, when we observed that poses 
behind a wall may possess non-zero likelihood. Our approximate map-based 
motion model may falsely assume that the robot just went through a wall, as 
long as the initial and final poses are in the unoccupied space. How damag- 
ing can this be? As noted above, this depends on the update interval. In fact, 
for sufficiently high update rates, and assuming that the noise variables in 
the motion model are bounded, we can guarantee that the approximation is 
tight and this effect will not occur. 

This analysis illustrates a subtle insight pertaining to the implementation 
of the algorithm. In particular, one has to pay attention to the update fre- 
quency. A Bayes filter that is updated frequently might yield fundamentally 
different results than one that is updated only occasionally. 


Summary 


This section derived the two principal probabilistic motion models for mo- 
bile robots operating on the plane. 


e We derived an algorithm for the probabilistic motion model p(x; | 
Ut, $4.1) that represents control uw; by a translational and angular velocity, 
executed over a fixed time interval At. In implementing this model, we 
realized that two control noise parameters, one for the translational and 
one for the rotational velocity, are insufficient to generate a space-filling 
(non-generate) posterior. We therefore added a third noise parameter, ex- 
pressed as a noisy "final rotation." 


* We presented an alternative motion model that uses the robot's odometry 
as input. Odometric measurements were expressed by three parameters, 
an initial rotation, followed by a translation, and a final rotation. The 
probabilistic motion model was implemented by assuming that all three 
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of these parameters are subject to noise. We noted that odometry readings 
are technically not controls; however, by using them just like controls we 
arrived at a simpler formulation of the estimation problem. 


* For both motion models, we presented two types of implementations, 
one in which the probability p(z; | ut, z1.1) is calculated in closed form, 
and one that enables us to generate samples from p(x; | ut, z; 1). The 
closed-form expression accepts as an input x+, uz, and x-1, and outputs a 
numerical probability value. To calculate this probability, the algorithms 
effectively invert the motion model, to compare the actual with the com- 
manded control parameters. The sampling model does not require such an 
inversion. Instead, it implements a forward model of the motion model 
p(z | us, $11). It accepts as an input the values w and z,_; and outputs 
a random x; drawn according to p(z, | ut, x, .1). Closed-form models are 
required for some probabilistic algorithms. Others, most notably particle 
filters, utilize sampling models. 


* Finally we extended all motion models to incorporate a map of the envi- 
ronment. The resulting probability p(x; | ut, x11, m) incorporates a map 
m in its conditional. This extension followed the intuition that the map 
specifies where a robot may be, which has an effect of the ability to move 
from pose 2-1 to x+. The resulting algorithm was approximate, in that 
we only checked for the validity of the final pose. 


The motion models discussed here are only examples: Clearly, the field of 
robotic actuators is much richer than just mobile robots operating in flat ter- 
rain. Even within the field of mobile robotics, there exist a number of de- 
vices that are not covered by the models discussed here. Examples include 
holonomic robots which can move sideways, or cars with suspension. Our 
description also does not consider robot dynamics, which are important for 
fast-moving vehicles such as cars on highways. Most of these robots can be 
modeled analogously; simply specify the physical laws of robot motion, and 
specify appropriate noise parameters. For dynamic models, this will require 
extending the robot state by a velocity vector that captures the dynamic state 
of the vehicle. In many ways, these extensions are straightforward. 

As far as measuring ego-motion is concerned, many robots rely on inertial 
sensors to measure motion, as a supplement to or in place of odometry. En- 
tire books have been dedicated to filter design using inertial sensors. Readers 
are encouraged to include richer models and sensors when odometry is in- 
sufficient. 
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Bibliographical Remarks 


The present material extends the basic kinematic equations of specific types of mobile robots 
(Cox and Wilfong 1990) by a probabilistic component. Drives covered by our model are the 
differential drive, the Ackerman drive, and synchro-drive (Borenstein et al. 1996). Drive sys- 
tems not covered by our model are those without non-holonomic constraints (Latombe 1991) 
like robots equipped with Mecanum wheels (Ilon 1975) or even legged robots, as described in 
pioneering papers by Raibert et al. (1986); Raibert (1991); Saranli and Koditschek (2002). 

The field of robotics has studied robot motion and interaction with a robotic environment in 
much more depth. Contemporary texts on mobile robots covering aspects of kinematics and dy- 
namics are due to Murphy (2000c); Dudek and Jenkin (2000); Siegwart and Nourbakhsh (2004). 
Cox and Wilfong (1990) provides a collection of articles by leading researchers at the time of 
publication; see also Kortenkamp et al. (1998). Classical treatments of robotic kinematics and 
dynamics can be found in Craig (1989); Vukobratovié (1989); Paul (1981); and Yoshikawa (1990). 
A more modern text addressing robotic dynamics is the one by Featherstone (1987). Compliant 
motion as one form of environment interaction has been studied by Mason (2001). Terrame- 
chanics, which refers to the interaction of wheeled robots with the ground, has been studied in 
seminal texts by Bekker (1956, 1969) and Wong (1989). A contemporary text on wheel-ground 
interaction can be found in Iagnemma and Dubowsky (2004). Generalizing such models into a 
probabilistic framework is a promising direction for future research. 


Exercises 


1. All robot models in this chapter were kinematic. In this exercise, you 
will consider a robot with dynamics. Consider a robot that lives in a 1-D 
coordinate system. Its location will be denoted by 2, its velocity by z, 
and its acceleration by 3. Suppose we can only control the acceleration 
i. Develop a mathematical motion model that computes the posterior 
over the pose x’ and the velocity i' from an initial pose x and velocity t, 
assuming that the acceleration 2 is the sum of a commanded acceleration 
and a zero-mean Gaussian noise term with variance o? (and assume that 
the actual acceleration remains constant in the simulation interval At). 
Are z' and v’ correlated in the posterior? Explain why / why not. 


2. Consider again the dynamic robot from Exercise 1. Provide a mathemati- 
cal formula for computing the posterior distribution over the final velocity 
x’, from the initial robot location z, the initial velocity «, and the final pose 
z'. What is remarkable about this posterior? 


3. Suppose we control this robot with random accelerations for T time inter- 
vals, for some large value of T. Will the final location x and the velocity 
t be correlated? If yes, will they be fully correlated as T 1 oo, so that one 
variable becomes a deterministic function of the other? 
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4. Now consider a simple kinematic model of an idealized bicycle. Both tires 


are of diameter d, and are mounted to a frame of length l. The front tire 
can swivel around a vertical axis, and its steering angle will be denoted a. 
The rear tire is always parallel to the bicycle frame and cannot swivel. 


For the sake of this exercise, the pose of the bicycle shall be defined 
through three variables: the x-y location of the center of the front tire, and 
the angular orientation 0 (yaw) of the bicycle frame relative to an external 
coordinate frame. The controls are the forward velocity v of the bicycle, 
and the steering angle a, which we will assume to be constant during each 
prediction cycle. 


Provide the mathematical prediction model for a time interval At, assum- 
ing that it is subject to Gaussian noise in the steering angle a and the 
forward velocity v. The model will have to predict the posterior of the 
bicycle state after At time, starting from a known state. If you cannot find 
an exact model, approximate it, and explain your approximations. 


. Consider the kinematic bicycle model from Exercise 4. Implement a sam- 


pling function for posterior poses of the bicycles under the same noise 
assumptions. 


For your simulation, you might assume l = 100cm, d = 80cm, At = 
lsec |a| € 80°, v € [0; 100]cm/sec. Assume further that the variance of 
the steering angle is c2. = 25°? and the variance of the velocity is o2 = 
50cm? / sec? - v?. Notice that the variance of the velocity depends on the 
commanded velocity. 


For a bicycle starting at the origin, plot the resulting sample sets for the 
following values of the control parameters: 























problem number a v 
1 25? 20cm/sec 
2 —25°  20cm/sec 
3 25? 90cm/sec 
4 80° 10cm/sec 
1 85? | 90cm/sec 











All your plots should show coordinate axes with units. 


. Consider once again the kinematic bicycle model from Exercise 4. Given 


an initial state z, y, 0 and a final x’ and y’ (but no final 6’), provide a math- 
ematical formula for determining the most likely values of a, v, and 6’. If 
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7. 


you cannot find a closed form solution, you could instead give a technique 
for approximating the desired values. 


A common drive for indoor robots is holonomic. A holonomic robot has 
as many controllable degrees of freedom as the dimension of its config- 
uration (or pose) space. In this exercise, you are asked to generalize the 
velocity model to a holonomic robot operating in the plane. Assume the 
robot can control its forward velocity, an orthogonal sidewards velocity, 
and a rotational velocity. Let us arbitrarily give sidewards motion to the 
left positive values, and motion to the right negative values. 


e State a mathematical model for such a robot, assuming that its controls 
are subject to independent Gaussian noise. 
* Provide a procedure for calculating p(x; | ut, 24-1). 


* Provide a sampling procedure for sampling z, ~ p(x; | uz, 24-1). 


. Prove that the triangular distribution in Equation (5.12) has mean 0 and 


variance b?. Prove the same for the sampling algorithm in Table 5.4. 
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Introduction 


Environment measurement models comprise the second domain-specific model 
in probabilistic robotics, next to motion models. Measurement models de- 
scribe the formation process by which sensor measurements are generated in 
the physical world. Today’s robots use a variety of different sensor modal- 
ities, such as tactile sensors, range sensors, or cameras. The specifics of the 
model depends on the sensor: Imaging sensors are best modeled by pro- 
jective geometry, whereas sonar sensors are best modeled by describing the 
sound wave and its reflection on surfaces in the environment. 

Probabilistic robotics explicitly models the noise in sensor measurements. 
Such models account for the inherent uncertainty in the robot’s sensors. For- 
mally, the measurement model is defined as a conditional probability dis- 
tribution p(z; | z,, m), where z; is the robot pose, z; is the measurement at 
time t, and m is the map of the environment. Although we mainly address 
range-sensors throughout this chapter, the underlying principles and equa- 
tions are not limited to this type of sensors. Instead the basic principle can 
be applied to any kind of sensor, such as a camera or a bar-code operated 
landmark detector. 

To illustrate the basic problem of mobile robots that use their sensors to 
perceive their environment, Figure 6.1a shows a typical sonar range scan ob- 
tained in a corridor with a mobile robot equipped with a cyclic array of 24 
ultrasound sensors. The distances measured by the individual sensors are 
depicted in light gray and the map of the environment is shown in black. 
Most of these measurements correspond to the distance of the nearest ob- 
ject in the measurement cone; some measurements, however, have failed to 
detect any object. 


»js.on 000000 


150 


SPECULAR REFLECTION 


LASER RANGE SCAN 


6 Robot Perception 





(a) N (b) specular 
N reflection 








smooth 
object 
surface 





sensor 














Figure 6.1 (a) Typical ultrasound scan of a robot in its environment. (b) A misread- 
ing in ultrasonic sensing. This effect occurs when firing a sonar signal towards a 
reflective surface at an angle o that exceeds half the opening angle of the sensor. 


The inability for sonar to reliably measure range to nearby objects is of- 
ten paraphrased as sensor noise. Technically, this noise is quite predictable: 
When measuring smooth surfaces (such as walls), the reflection is usually 
specular, and the wall effectively becomes a mirror for the sound wave. This 
can be problematic when hitting a smooth surface at an angle. Here the echo 
may travel into a direction other than the sonar sensor, as illustrated in Fig- 
ure 6.1b. This effect often leads to overly large range measurements when 
compared to the true distance to the nearest object in the main cone. The 
likelihood of this to happen depends on a number of properties, such as the 
surface material, the angle between the surface normal and the direction of 
the sensor cone, the range of the surface, the width of the main sensor cone, 
and the sensitivity of the sonar sensor. Other errors, such as short readings, 
may be caused by cross-talk between different sensors (sound is slow!) or by 
unmodeled objects in the proximity of the robot, such as people. 

Figure 6.2 shows a typical laser range scan, acquired with a 2-D laser range 
finder. Laser is similar to sonar in that it also actively emits a signal and 
records its echo, but in the case of laser the signal is a light beam. A key 
difference to sonars is that lasers provide much more focused beams. The 
specific laser in Figure 6.2 is based on a time-of-flight measurement, and 
measurements are spaced in one degree increments. 

As a rule of thumb, the more accurate a sensor model, the better the 
results—though there are some important caveats that were already dis- 
cussed in Chapter 2.4.4. In practice, however, it is often impossible to model 
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Figure 6.2 A typical laser range scan, acquired with a SICK LMS laser. The envi- 
ronment shown here is a coal mine. Image courtesy of Dirk Hahnel, University of 
Freiburg. 


a sensor accurately, primarily due to the complexity of physical phenomena. 

Often, the response characteristics of a sensor depends on variables we 
prefer not to make explicit in a probabilistic robotics algorithm (such as the 
surface material of walls, which for no particular reason is commonly not 
considered in robotic mapping). Probabilistic robotics accommodates inac- 
curacies of sensor models in the stochastic aspects: By modeling the mea- 
surement process as a conditional probability density, p(z; | x+), instead of a 
deterministic function z; = f (x+), the uncertainty in the sensor model can be 
accommodated in the non-deterministic aspects of the model. Herein lies a 
key advantage of probabilistic techniques over classical robotics: in practice, 
we can get away with extremely crude models. However, when devising 
a probabilistic model, care has to be taken to capture the different types of 
uncertainties that may affect a sensor measurement. 

Many sensors generate more than one numerical measurement value 
when queried. For example, cameras generate entire arrays of values (bright- 
ness, saturation, color); similarly, range finders usually generate entire scans 
of ranges. We will denote the number of such measurement values within a 
measurement z; by K, hence we can write: 


a = ues) 


We will use z7 to refer to an individual measurement (e.g., one range value). 
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The probability p(z | x:,m) is obtained as the product of the individual 
measurement likelihoods 


K 
plz | zam) = ll» | ve, m) 
k=1 


Technically, this amounts to an independence assumption between the noise 
in each individual measurement beam—just as our Markov assumption as- 
sumes independent noise over time (c.f., Chapter 2.4.4). This assumption is 
only true in the ideal case. We already discussed possible causes of depen- 
dent noise in Chapter 2.4.4. To recapitulate, dependencies typically exist due 
to a range of factors: people, who often corrupt measurements of several ad- 
jacent sensors; errors in the model m; approximations in the posterior; and 
so on. For now, however, we will simply not worry about violations of the 
independence assumption, as we will return to this issue in later chapters. 


Maps 


To express the process of generating measurements, we need to specify the 
environment in which a measurement is generated. A map of the environ- 
ment is a list of objects in the environment and their locations. We have 
already informally discussed maps in the previous chapter, where we de- 
veloped robot motion models that took into consideration the occupancy of 
different locations in the world. Formally, a map m is a list of objects in the 
environment along with their properties: 


m = imi,ma,...,my) 


Here N is the total number of objects in the environment, and each m,, with 
1 < n € N specifies a property. Maps are usually indexed in one of two 
ways, known as feature-based and location-based. In feature-based maps, n is a 
feature index. The value of m, contains, next to the properties of a feature, 
the Cartesian location of the feature. In location-based maps, the index n 
corresponds to a specific location. In planar maps, it is common to denote a 
map element by m; instead of mn, to make explicit that mg is the property 
of a specific world coordinate, (x y). 

Both types of maps have advantages and disadvantages. Location-based 
maps are volumetric, in that they offer a label for any location in the world. 
Volumetric maps contain information not only about objects in the environ- 
ment, but also about the absence of objects (e.g., free-space). This is quite 
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different in feature-based maps. Feature-based maps only specify the shape of 
the environment at the specific locations, namely the locations of the objects 
contained in the map. Feature representation makes it easier to adjust the 
position of an object; e.g., as a result of additional sensing. For this reason, 
feature-based maps are popular in the robotic mapping field, where maps 
are constructed from sensor data. In this book, we will encounter both types 
of maps—in fact, we will occasionally move from one representation to the 
other. 

A classical map representation is known as occupancy grid map, which will 
be discussed in detail in Chapter 9. Occupancy maps are location-based: 
They assign to each z-y coordinate a binary occupancy value that speci- 
fies whether or not a location is occupied with an object. Occupancy grid 
maps are great for mobile robot navigation: They make it easy to find paths 
through the unoccupied space. 

Throughout this book, we will drop the distinction between the physical 
world and the map. Technically, sensor measurements are caused by physi- 
cal objects, not the map of those objects. However, it is tradition to condition 
sensor models on the map m; hence we will adopt a notation that suggests 
measurements depend on the map. 


Beam Models of Range Finders 


Range finders are among the most popular sensors in robotics. Our first mea- 
surement model in this chapter is therefore an approximative physical model 
of range finders. Range finders measure the range to nearby objects. Range 
may be measured along a beam—which is a good model of the workings 
of laser range finders—or within a cone—which is the preferable model of 
ultrasonic sensors. 


The Basic Measurement Algorithm 


Our model incorporates four types of measurement errors, all of which are 
essential to making this model work: small measurement noise, errors due 
to unexpected objects, errors due to failures to detect objects, and random 
unexplained noise. The desired model p(z; | z;, m) is therefore a mixture of 
four densities, each of which corresponds to a particular type of error: 


1. Correct range with local measurement noise. In an ideal world, a range 
finder would always measure the correct range to the nearest object in its 
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Figure 6.3 Components of the range finder sensor model. In each diagram the hor- 
izontal axis corresponds to the measurement zk, the vertical to the likelihood. 


measurement field. Let us use z¥* to denote the “true” range of the object 
measured by z¥. In location-based maps, the range z¥* can be determined 
using ray casting; in feature-based maps, it is usually obtained by search- 
ing for the closest feature within a measurement cone. However, even if 
the sensor correctly measures the range to the nearest object, the value it 
returns is subject to error. This error arises from the limited resolution 
of range sensors, atmospheric effect on the measurement signal, and so 
on. This measurement noise is usually modeled by a narrow Gaussian with 
mean z7* and standard deviation cpi. We will denote the Gaussian by 
Prit- Figure 6.3a illustrates this density pyit, for a specific value of z7*. 


In practice, the values measured by the range sensor are limited to the 
interval [0; zmax], where zmax denotes the maximum sensor range. Thus, 
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the measurement probability is given by 


Phi (2 | x: m) = n N (22; 28", obit) if 0 < ze < nus 
QUE ; 0 otherwise 


where z}* is calculated from zt and m via ray casting, and N (z7; zf , o2) 


denotes the univariate normal distribution with mean 27* and standard 


deviation onit: 


ko Rs 2 
1 ifi Ži ) 

ky ke 2 _ 2 o2. 

N (zz; z; Chit) => — TS € hit 


V 2roiit 
The normalizer 7 evaluates to 
Zmax -1 
n= ("Neha obs) at) 
0 


The standard deviation oni; is an intrinsic noise parameter of the measure- 
ment model. Below we will discuss strategies for setting this parameter. 


. Unexpected objects. Environments of mobile robots are dynamic, 


whereas maps m are static. As a result, objects not contained in the map 
can cause range finders to produce surprisingly short ranges—at least 
when compared to the map. A typical example of moving objects are peo- 
ple that share the operational space of the robot. One way to deal with 
such objects is to treat them as part of the state vector and estimate their 
location; another, much simpler approach, is to treat them as sensor noise. 
Treated as sensor noise, unmodeled objects have the property that they 
cause ranges to be shorter than 27*, not longer. 


The likelihood of sensing unexpected objects decreases with range. To 
see, imagine there are two people that independently and with the same 
fixed likelihood show up in the perceptual field of a proximity sensor. One 
person's range is rı, and the second person's range is r2. Let us further 
assume that rı < r2, without loss of generality. Then we are more likely 
to measure rı than r3. Whenever the first person is present, our sensor 
measures rı. However, for it to measure rs, the second person must be 
present and the first must be absent. 


Mathematically, the probability of range measurements in such situations 
is described by an exponential distribution. The parameter of this distribu- 
tion, Ashort, is an intrinsic parameter of the measurement model. Accord- 
ing to the definition of an exponential distribution we obtain the following 
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equation for Pshort (ZË | £+, m): 


一 Xshortzz i k k 
N Ashort € “SPO if 0 < zf < oz" 


k = 
Pshort (2; | v4, m). = 0 otherwise 


As in the previous case, we need a normalizer 7 since our exponential is 
limited to the interval |0; z/*]. Because the cumulative probability in this 
interval is given as 

kx 


t 
| Aden e het dub = Lg short" 4 eTÀshort0 
snor => 
0 


kx 
m ees e rshort 2; 


the value of 7 can be derived as: 


1 


k 
1 — e 一 Ashort28 * 


T] 一 


Figure 6.3b depicts this density graphically. This density falls off expo- 
nentially with the range z7. 


. Failures. Sometimes, obstacles are missed altogether. For example, this 


happens frequently for sonar sensors as a result of specular reflections. 
Failures also occur with laser range finders when sensing black, light- 
absorbing objects, or for some laser systems when measuring objects in 
bright sunlight. A typical result of a sensor failure is a max-range measure- 
ment: the sensor returns its maximum allowable value zmax. Since such 
events are quite frequent, it is necessary to explicitly model max-range 
measurements in the measurement model. 


We will model this case with a point-mass distribution centered at zmax: 


{ T. Ab Z= ee 


Dinter | z,, m) = I(z = zmax) 0 otherwise 


Here J denotes the indicator function that takes on the value 1 if its ar- 
gument is true, and is 0 otherwise. Technically, pmax does not possess a 
probability density function. This is because pmax is a discrete distribu- 
tion. However, this shall not worry us here, as our mathematical model of 
evaluating the probability of a sensor measurement is not affected by the 
non-existence of a density function. (In our diagrams, we simply draw 
Pmax aS a Very narrow uniform distribution centered at zmax, so that we 
can pretend a density exists). 
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Figure 6.4 "Pseudo-density" of a typical mixture distribution p(zf | £4, m). 


4. Random measurements. Finally, range finders occasionally produce en- 
tirely unexplainable measurements. For example, sonars often generate 
phantom readings when they bounce off walls, or when they are subject 
to cross-talk between different sensors. To keep things simple, such mea- 
surements will be modeled using a uniform distribution spread over the 
entire sensor measurement range [0; zmax]: 


1 
G 


Figure 6.3d shows the density of the distribution Prana- 





OSA <a 


k = 
Prand(2% | zt) = otherwise 


These four different distributions are now mixed by a weighted average, de- 
fined by the parameters zyit, zshorty Zmax; ANA Zang With Zpit + Zshort + 2max + 


Zrand = 1. 
T 
Zhit Prit (27 | t, m) 
k 
k Zshort Dshort( 24 | Tt m) 
Z. Tt m = 

p( : | n ) Zmax Dale, | z,,m) 
Zrand Pisae? | Tt, m) 


A typical density resulting from this linear combination of the individual 
densities is shown in Figure 6.4 (with our visualization of the point-mass 
distribution pmax as a small uniform density). As the reader may notice, the 
basic characteristics of all four basic models are still present in this combined 
density. 
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1: Algorithm beam_range_finder_model(z;, £4, m): 

2: q=1 

3: fork = 1 to K do 

4: compute z}* for the measurement z¥ using ray casting 
5: D = Zhit Pnit(ZP | ve, m) + Zshort * Pshort (27. | ze, m) 

6: 十 2max Dax (2b | £e, M) + Zrand * Prana (Zë | £e, m) 
7: q—4:p 

8: return q 











Table6.1 Algorithm for computing the likelihood of a range scan z;, assuming con- 
ditional independence between the individual range measurements in the scan. 


The range finder model is implemented by the algorithm 
beam range finder model in Table 6.1. The input of this algorithm is 
a complete range scan z+, a robot pose x+, and a map m. Its outer loop (lines 
2 and 7) multiplies the likelihood of individual sensor beams z7, following 
Equation (6.2). Line 4 applies ray casting to compute the noise-free range for 
a particular sensor measurement. The likelihood of each individual range 
measurement z¥ is computed in line 5, which implements the mixing rule for 
densities stated in (6.12). After iterating through all sensor measurements z7 
in z, the algorithm returns the desired probability p(z; | £4, m). 


Adjusting the Intrinsic Model Parameters 


In our discussion so far we have not addressed the question of how to choose 
the various parameters of the sensor model. These parameters include the 
mixing parameters zhit, Zshort; Zmax; ANd Zand. They also include the param- 
eters Chit and Ashort. We will refer to the set of all intrinsic parameters as ©. 
Clearly, the likelihood of any sensor measurement is a function of O. Thus, 
we will now discuss an algorithm for adjusting model parameters. 

One way to determine the intrinsic parameters is to rely on data. Fig- 
ure 6.5 depicts two series of 10,000 measurements obtained with a mobile 
robot traveling through a typical office environment. Both plots show only 
range measurements for which the expected range was approximately 3 me- 
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(a) Sonar data (b) Laser data 


500 mener sete sm 

















Figure 6.5 Typical data obtained with (a) a sonar sensor and (b) a laser-range sensor 
in an office environment for a “true” range of 300 cm and a maximum range of 500 
cm. 


ters (between 2.9m and 3.1m). The left plot depicts the data for sonar sensors, 
and the right plot the corresponding data for laser sensors. In both plots, the 
x-axis shows the number of the reading (from 1 to 10,000), and the y-axis is 
the range measured by the sensor. 

Whereas most of the measurements are close to the correct range for both 
sensors, the behaviors of the sensors differ substantially. The ultrasound 
sensor appears to suffer from many more measurement noise and detection 
errors. Quite frequently it fails to detect an obstacle, and instead reports max- 
imum range. In contrast, the laser range finder is more accurate. However, it 
also occasionally reports false ranges. 

A perfectly acceptable way to set the intrinsic parameters O is by hand: 
simply eyeball the resulting density until it agrees with your experience. 
Another, more principled way is to learn these parameters from actual data. 
This is achieved by maximizing the likelihood of a reference data set Z = {z;} 
with associated positions X = (x;) and map m, where each z; is an actual 
measurement, x; is the pose at which the measurement was taken, and m is 
the map. The likelihood of the data Z is given by 


p(Z | X, m, ©) 


Our goal is to identify intrinsic parameters O that maximize this likelihood. 
Any estimator, or algorithm, that maximizes the likelihood of data is known 
as a maximum likelihood estimator, or ML estimator. 

Table 6.2 depicts the algorithm learn intrinsic parameters, which is an 
algorithm for calculating the maximum likelihood estimate for the intrinsic 
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1: Algorithm learn intrinsic parameters(Z, X, m): 
2: repeat until convergence criterion satisfied 
for all z; in Z do 
4: n = | Prit (zi | £i, m) + Pshort (2 | £i, m) 
十 Dmax (Zi | Ti, m) + Prana (ži | Ti, m) ]* 
5: calculate z; 
6: Ci hit = 7) Pit (zi | vi, m) 
7: €i short = 7) Pshort (zi | Ti, m) 
8: ĉi, max — T] Pmax(Ži | Ti, m) 
9: €i rand — 7] Prana (Zi | £i, m) 
10: Zu = |Z|! Y ein 
11: Zshort — |Z|-1 ss €i short 
12: Zmax 一 A bee €i max 
13: Zrand = |Z 2. €i rand 
14: Chit = yy ei hit (Zi E gr" 
. Ci,short 
15: Ashort m — Zi 
16: return © = {Znit, Zshort: “max; “rand; C hit; Ashort } 








Table 6.2 Algorithm for learning the intrinsic parameters of the beam-based sensor 
model from data. 


parameters. As we shall see below, the algorithm is an instance of the expec- 
tation maximization (EM) algorithm, an iterative procedure for estimating ML 
parameters. 

Initially, the algorithm learn_intrinsic_parameters in Table 6.2 requires 
a good initialization of the intrinsic parameters ei; and Ashort. In lines 3 
through 9, it estimates auxiliary variables: Each e; xxx is the probability that 
the measurement z; is caused by “xxx,” where “xxx” is chosen from the four 
aspects of the sensor model, hit, short, max, and random. Subsequently, it es- 
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timates the intrinsic parameters in lines 10 through 15. The intrinsic parame- 
ters, however, are a function of the expectations calculated before. Adjusting 
the intrinsic parameters causes the expectations to change, for which reason 
the algorithm has to be iterated. However, in practice the iteration converges 
quickly, and a dozen iterations are usually sufficient to give good results. 

Figure 6.6 graphically depicts four examples of data and the ML measure- 
ment model calculated by learn_intrinsic_parameters. The first row shows 
approximations to data recorded with the ultrasound sensor. The second row 
contains plots of two functions generated for laser range data. The columns 
correspond to different “true” ranges. The data is organized in histograms. 
One can clearly see the differences between the different graphs. The smaller 
the range z/* the more accurate the measurement. For both sensors the 
Gaussians are narrower for the shorter range than they are for the longer 
measurement. Furthermore, the laser range finder is more accurate than the 
ultrasound sensor, as indicated by the narrower Gaussians and the smaller 
number of maximum range measurements. The other important thing to no- 
tice is the relatively high likelihood of short and random measurements. This 
large error likelihood has a disadvantage and an advantage: On the negative 
side, it reduces the information in each sensor reading, since the difference in 
likelihood between a hit and a random measurement is small. On the posi- 
tive side this model is less susceptible to unmodeled systematic perturbations, 
such as people who block the robot's path for long periods of time. 

Figure 6.7 illustrates the learned sensor model in action. Shown in Fig- 
ure 6.7a is a 180 degree range scan. The robot is placed in a previously 
acquired occupancy grid map at its true pose. Figure 6.7b plots a map of 
the environment along with the likelihood p(z; | z;, m) of this range scan 
projected into x-y-space (by maximizing over the orientation 0). The darker 
a location, the more likely it is. As is easily seen, all regions with high likeli- 
hood are located in the corridor. This comes at little surprise, as the specific 
scan is geometrically more consistent with corridor locations than with loca- 
tions inside any of the rooms. The fact that the probability mass is spread out 
throughout the corridor suggests that a single sensor scan is insufficient to 
determine the robot's exact pose. This is largely due to the symmetry of the 
corridor. The fact that the posterior is organized in two narrow horizontal 
bands is due to the fact that the orientation of the robot is unknown: each of 
these bands corresponds to one of the two surviving heading directions of 
the robot. 
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(a) Sonar data, plots for two different ranges 
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(b) Laser data, plots for two different ranges 



























































Z 2 
Figure 6.6 Approximation of the beam model based on (a) sonar data and (b) laser 


range data. The sensor models depicted on the left were obtained by a maximum 
likelihood approximation to the data sets depicted in Figure 6.5. 


Mathematical Derivation of the Beam Model 


To derive the ML estimator, it shall prove useful to introduce auxiliary vari- 
ables c;, the so-called correspondence variable. Each c; can take on one of 
four values, hit, short, max, and random, corresponding to the four possible 
mechanisms that might have produced a measurement z;. 

Let us first consider the case in which the c;'s are known. We know which 
of the four mechanisms described above caused each measurement z;. Based 
on the values of the c;'s, we can decompose Z into four disjoint sets, Znit, 
Zshort; Zmax; ANd Orand Which together comprise the set Z. The ML estima- 
tors for the intrinsic parameters znit, Zshort, Zmax, ANd Zrana are simply the 
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(a) Laser scan and part of the map 
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(b) Likelihood for different positions 


Figure 6.7 Probabilistic model of perception: (a) Laser range scan, projected into a 
previously acquired map m. (b) The likelihood p(z; | x+, m), evaluated for all posi- 
tions x, and projected into the map (shown in gray). The darker a position, the larger 


p(2 | zi, m). 


normalized ratios: 


Zhit |Znit | 
Zshort = Iz |Zsnortl 
Zmax |Zmax| 
Zrand |Zranal 


The remaining intrinsic parameters, Chit and Ashort, are obtained as follows. 
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For the data set Zhit we get from (6.5) 


P(Zpit | X,m,O) = Il Pnit (zi | vi, m, 9) 


Zi€ Zhit 


2 
Zi€ Zhit V2rcii 


Here z; is the "true" range, computed from the pose Zi and the map m. A 
classic trick of ML estimation is to maximize the logarithm of the likelihood, 
instead of the likelihood directly. The logarithm is a strictly monotonic func- 
tion, hence the maximum of the log-likelihood is also the maximum of the 
original likelihood. The log-likelihood is given by 


1 1 
log p(Znit | X, m,0) = 5 |- 7 log 2rchit 一 z 
which is now easily transformed as follows 


log p(Znit | X, m, 9) 
1 (z — 27)? 
mS 5 m 2nogu 十 PM 


Zi€ Znit hit 





1 
2 





(zi — 27)? 
|Zui«| log 27 十 2|Znit| log Chit d 5 5 


cé. 
Zi € Zhit hit 


1 
= const. — |Znit| log dnit 一 302. 5 (zi — zy 
Thit Zi€Zpit 


The derivative of this expression in the intrinsic parameter oy; is as follows: 
ð log p(Znit | X, m, 9) = |Zhitl + T 
Oa: hit Chit 





(zi A). 


3 
Thit Zi€Zpit 
The maximum of the log-likelihood is now obtained by setting this derivative 
to zero. From that we get the solution to our ML estimation problem. 


1 
Chit = TIT 5 (z= 27)? 
|ui] Zi € Zhit 


The estimation of the remaining intrinsic parameter Ashort proceeds just 
about in the same way. The posterior over the data Zshort is given by 


D(Zshort | X,m, 9) S Il Dshort (Zi | zi m) 


Zi€ Zshort 
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= l | Ahon eT Ashort 24 


Zi€ Zshort 
The logarithm is 
log p( Zznort | X, Th, ©) = 5 log Ashort T Ashort Zi 
Zi€ Zshort 
= |Z short log Ashort mn Ashort 5 Zi 





Zi E Z short 


The first derivative of this expression with respect to the intrinsic parameter 
Ashort is as follows: 





0 log p(Zshort | X, mM, 9) |Zshort | Y 
ei 


Or 入 
short short Zi€ Zehort 


Setting this to zero gives us the ML estimate for the intrinsic parameter Ashort 


入 zs |Zsnort | 
oe De ee Ži 


This derivation assumed knowledge of the parameters c;. We now extend it 
to the case where the c;'s are unknown. As we shall see, the resulting ML 
estimation problem lacks a closed-form solution. However, we can devise a 
technique that iterates two steps, one that calculates an expectation for the 
c;'S and one that computes the intrinsic model parameters under these ex- 
pectations. As noted, the resulting algorithm is an instance of the expectation 
maximization algorithm, usually abbreviated as EM. 

To derive EM, it will be beneficial to define the likelihood of the data Z 
first: 


log p(Z | X, m, ©) 
EE 5 log p(zi | Ti, M, 9) 


ZACZ 
= ps log pit (zi | vi, m) + 5 log Pshort (2i | £i, m) 
Zi€ Znit Zi€ Zshort 
js 5 log Pmax (Zi | Ti, m) + 5 log Prana (ži | Ti, m) 
Zi€ Zmax Zi € Zrand 


This expression can be rewritten using the variables c;: 


logp(Z | X,m,©) = M; Ie; = hit) logpua(z; | vi, m) 
ZCZ 
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+I(c; = short) log pshort (zi | zi, m) 


+H (ci = max) log Pmax (Zi | Ti, m) 





+I (ci = rand) log Prana (zi | £i, m) 


where J is the indicator function. Since the values for c; are unknown, it is 
common to integrate them out. Put differently, EM maximizes the expecta- 
tion Ellog p(Z | X, m, 9)], where the expectation is taken over the unknown 
variables c;: 


Ellog p(Z | X, m, 8| 
= xci = hit) log pnic(z; | zi, m) + p(c; = short) log Pshort (zi | xi, m) 


LU 
+p(ci = max) log Pmax (Zi | Ti, m) + ple; 去 rand) log Prana (Zi | Ti, m) 
=: >》 einit log pnit(2 | i, M) + ei short log Pshort (2i | £i, m) 
i 
+Ei max log Pmax (Zi | zi, m) + Ci rand log Prana (ži | Ti, m) 
With the definition of the variable e as indicated. This expression is maxi- 


mized in two steps. In a first step, we consider the intrinsic parameters ahit 
and Ashort given and calculate the expectation over the variables c;. 


ei hit p(c; = hit) Dhit (2i | v, m) 
Ci short " p(ci = short) m Pshort (zi | Ti, m) 
ĉi max p(ci = max) Dmax (Zi | Ti, m) 
€i rand p(ci X rand) Prana (Zi | Ti, m) 


The normalizer is given by 


[ pnit (zi | £i, M) + Pshort (Zi | vi, m) 


+HPmax(ži | Ti, m) + Prana (2i | Zi m) ] 


T] 一 
e 


This step is called the “E-step,” indicating that we calculate expectations over 
the latent variables c;. The remaining step is now straightforward, since the 
expectations decouple the dependencies between the different components 
of the sensor model. First, we note that the ML mixture parameters are sim- 
ply the normalized expectations 


Zhit Ci hit 
Zshort Es €i short 
= ry 
Zmax i €i max 
Zrand €i rand 
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The ML parameters opit and Asnort are then obtained analogously, by replac- 
ing the hard assignments in (6.19) and (6.23) by soft assignments weighted 
by the expectations. 


1 
Ohit = = X ei nit (zi — 27)? 
Saez Ci hit ziEZ 


and 





入 P €i short 
short 
Deg Ci short Ži 


Practical Considerations 


In practice, computing the densities of all sensor readings can be quite in- 
volved from a computational perspective. For example, laser range scanners 
often return hundreds of values per scan, at a rate of several scans per sec- 
ond. Since one has to perform a ray casting operation for each beam of the 
scan and every possible pose considered, the integration of the whole scan 
into the current belief cannot always be carried out in real-time. One typi- 
cal approach to solve this problem is to incorporate only a small subset of 
all measurements (e.g., 8 equally spaced measurements per laser range scan 
instead of 360). This approach has an important additional benefit. Since 
adjacent beams of a range scan are often not independent, the state estima- 
tion process becomes less susceptible to correlated noise in adjacent mea- 
surements. 

When dependencies between adjacent measurements are strong, the ML 
model may make the robot overconfident and yield suboptimal results. One 
simple remedy is to replace p(z7 | z;, m) by a “weaker” version p(z¥ | z,, m)? 
for a < 1. The intuition here is to reduce, by a factor of a, the information 
extracted from a sensor measurement (the log of this probability is given by 
a log p(zk | x+, m)). Another possibility—which we will only mention here— 
is to learn the intrinsic parameters in the context of the application: For exam- 
ple, in mobile localization it is possible to train the intrinsic parameters via 
gradient descent to yield good localization results over multiple time steps. 
Such a multi-time step methodology is significantly different from the single 
time step ML estimator described above. In practical implementations it can 
yield superior results; see Thrun (19982). 

The main drain of computing time for beam-based models is the ray cast- 
ing operation. The runtime costs of computing p(z, | zx, m) can be sub- 
stantially reduced by pre-cashing the ray casting algorithm, and storing the 
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result in memory—so that the ray casting operation can be replaced by a 
(much faster) table lookup. An obvious implementation of this idea is to de- 
compose the state space into a fine-grained three-dimensional grid, and to 
pre-compute the ranges z¥* for each grid cell. This idea was already investi- 
gated in Chapter 4.1. Depending on the resolution of the grid, the memory 
requirements can be significant. In mobile robot localization, we find that 
pre-computing the range with a grid resolution of 15 centimeters and 2 de- 
grees works well for indoor localization problems. It fits well into the RAM 
for moderate-sized computers, yielding speed-ups by an order of magnitude 
over the plain implementation that casts rays online. 


Limitations of the Beam Model 


The beam-based sensor model, while closely linked to the geometry and 
physics of range finders, suffers two major drawbacks. 

In particular, the beam-based model exhibits a lack of smoothness. In clut- 
tered environments with many small obstacles, the distribution p(z* | zi, m) 
can be very unsmooth in x;. Consider, for example, an environment with 
many chairs and tables (like a typical conference room). A robot like the ones 
shown in Chapter 1 will sense the legs of those obstacles. Obviously, small 
changes of a robot's pose x, can have a tremendous impact on the correct 
range of a sensor beam. As a result, the measurement model p(z7 | z+, m) is 
highly discontinuous in x+. The heading direction 6, is particularly affected, 
since small changes in heading can cause large displacements in x-y-space at 
a range. 

Lack of smoothness has two problematic consequences. First, any approx- 
imate belief representation runs the danger of missing the correct state, as 
nearby states might have drastically different posterior likelihoods. This 
poses constraints on the accuracy of the approximation which, if not met, 
increase the resulting error in the posterior. Second, hill climbing methods 
for finding the most likely state are prone to local minima, due to the large 
number of local maxima in such unsmooth models. 

The beam-based model is also computational involved. Evaluating p(z}' | 
x1, M) for each single sensor measurement z¥ involves ray casting, which is 
computationally expensive. As noted above, the problem can be partially 
remedied by pre-computing the ranges over a discrete grid in pose space. 
Such an approach shifts the computation into an initial off-line phase, with 
the benefit that the algorithm is faster at run time. However, the resulting 
tables are very large, since they cover a large three-dimensional space. Thus, 
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pre-computing ranges is computationally expensive and requires substantial 
memory. 


Likelihood Fields for Range Finders 


Basic Algorithm 


We will now describe an alternative model, called likelihood field, which over- 
comes these limitations. This model lacks a plausible physical explanation. 
In fact, it is an “ad hoc” algorithm that does not necessarily compute a condi- 
tional probability relative to any meaningful generative model of the physics 
of sensors. However, the approach works well in practice. The resulting pos- 
teriors are much smoother even in cluttered space, and the computation is 
more efficient. 

The key idea is to first project the end points of a sensor scan z; into the 
global coordinate space of the map. To do so, we need to know where rel- 
ative to the global coordinate frame the robot's local coordinate system is 
located, where on the robot the sensor beam z% originates, and where the 
sensor points. As usual let x, = (x y 0)" denote a robot pose at time t. 
Keeping with our two-dimensional view of the world, we denote the rela- 
tive location of the sensor in the robot's fixed, local coordinate system by 
(£k sens Yk,sens)”, and the angular orientation of the sensor beam relative to 
the robot's heading direction by 0% sens. These values are sensor-specific. The 
end point of the measurement z7 is now mapped into the global coordinate 
system via the obvious trigonometric transformation. 


Ck x cosÜ 一 Sinb0 Cheers x (| cos(0 + Ok sens) 
t = + ; ^ 十 Zt s ' 

| UM (2) pu cos 0 ) ( Uk sens ) ( sin(0 + i, sens) ) 
These coordinates are only meaningful when the sensor detects an obstacle. 
If the range sensor takes on its maximum value zE = zmax, these coordinates 
have no meaning in the physical world (even though the measurement does 
carry information). The likelihood field measurement model simply discards 
max-range readings. 

Similar to the beam model discussed before, we assume three types of 
sources of noise and uncertainty: 





1. Measurement noise. Noise arising from the measurement process is 
modeled using Gaussians. In x-y-space, this involves finding the nearest 
obstacle in the map. Let dist denote the Euclidean distance between the 
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(a) example environment (b) likelihood field 
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Figure6.8 (a) Example environment with three obstacles (gray). The robot is located 
towards the bottom of the figure, and takes a measurement zE as indicated by the 
dashed line. (b) Likelihood field for this obstacle configuration: the darker a location, 
the less likely it is to perceive an obstacle there. The probability p(z? | x+, m) for the 


specific sensor beam is shown in Figure 6.9. 


(a) Prit (zh | zt, m) (b) plz | zt, m) 














01 02 O3  Zmax 01 02 03 2max 
Figure 6.9 (a) Probability pnit (27) as a function of the measurement z*, for the sit- 
uation depicted in Figure 6.8. Here the sensor beam passes by three obstacles, with 
respective nearest points 01, 02, and os. (b) Sensor probability p(z? | zi, m), obtained 
for the situation depicted in Figure 6.8, obtained by adding two uniform distributions. 
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measurement coordinates (a, Yz} )T and the nearest object in the map m. 
Then the probability of a sensor measurement is given by a zero-centered 
Gaussian, which captures the sensor noise: 


Prit (zë | zm) = Eo, (dist) 


Figure 6.8a depicts a map, and Figure 6.8b shows the corresponding Gaus- 
sian likelihood for measurement points (Za Yer)” in 2-D space. The 
brighter a location, the more likely it is to measure an object with a range 
finder. The density Phit is now obtained by intersecting (and normaliz- 
ing) the likelihood field by the sensor axis, indicated by the dashed line in 
Figure 6.8. The resulting function is the one shown in Figure 6.9a. 


2. Failures. As before, we assume that max-range readings have a distinct 
large likelihood. As before, this is modeled by a point-mass distribution 


Pmax- 


3. Unexplained random measurements. Finally, a uniform distribution 
Prana is used to model random noise in perception. 


Just as for the beam-based sensor model, the desired probability p(z7 | z;, m) 
integrates all three distributions: 


Zhit ` Dhit + Zrand* Prand + Zmax ° Pmax 


using the familiar mixing weights zpit, Zrana, and Zmax. Figure 6.9b shows 
an example of the resulting distribution p(z} | z;, m) along a measurement 
beam. It should be easy to see that this distribution combines phit, as shown 
in Figure 6.9a, and the distributions Pmax and Prana. Much of what we said 
about adjusting the mixing parameters transfers over to our new sensor 
model. They can be adjusted by hand or learned using the ML estimator. 
A representation like the one in Figure 6.8b, which depicts the likelihood of 
an obstacle detection as a function of global x-y-coordinates, is called the 
likelihood field. 

Table 6.3 provides an algorithm for calculating the measurement proba- 
bility using the likelihood field. The reader should already be familiar with 
the outer loop, which multiplies the individual values of p(z? | z;, m), as- 
suming independence between the noise in different sensor beams. Line 4 
checks if the sensor reading is a max range reading, in which case it is simply 
ignored. Lines 5 to 8 handle the interesting case: Here the distance to the 
nearest obstacle in z-y-space is computed (line 7), and the resulting likeli- 
hood is obtained in line 8 by mixing a normal and a uniform distribution. As 
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1: Algorithm likelihood_field_range_finder_model(z;, x+, m): 








2: q—1 

3: for all k do 

4: if zË X Zmax 

5: v =T 十 Zk,sens COS Ô — Yk sens Sin 0 + zE cos(0 + Ok,sens) 

6: Vu =Y + Yk sens COS Ô + Lk,sens Sin 0 + zk sin(0 + 0i sens) 

7: dist — min Ue —g) 十 (Yer — y’)?| (x', y') occupied in m} 





q=q:; (znit - prob(dist, onit) + Žrandom ) 


Zmax 


return q 











Table 6.3 Algorithm for computing the likelihood of a range finder scan using Eu- 
clidean distance to the nearest neighbor. The function prob(dist, anit) computes the 
probability of the distance under a zero-centered Gaussian distribution with standard 
deviation oni. 


before, the function prob(dist, ohit) computes the probability of dist under a 
zero-centered Gaussian distribution with standard deviation onhit . 

The search for the nearest neighbor in the map (line 7) is the most costly 
operation in algorithm likelihood field range finder model. To speed up 
this search, it is advantageous to pre-compute the likelihood field, so that 
calculating the probability of a measurement amounts to a coordinate trans- 
formation followed by a table lookup. Of course, if a discrete grid is used, the 
result of the lookup is only approximate, in that it might return the wrong 
obstacle coordinates. However, the effect on the probability p(zF | z;, m) is 
typically small even for moderately course grids. 





Extensions 


A key advantage of the likelihood field model over the beam-based model 
discussed before is smoothness. Due to the smoothness of the Euclidean 
distance, small changes in the robot's pose x; only have small effects on the 
resulting distribution p(z? | z;, m). Another key advantage is that the pre- 
computation takes place in 2-D, instead of 3-D, increasing the compactness 
of the pre-computed information. 
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Figure 6.10 (a) Occupancy grid map of the San Jose Tech Museum, (b) pre-processed 
likelihood field. 


However, the current model has three key disadvantages: First, it does not 
explicitly model people and other dynamics that might cause short readings. 
Second, it treats sensors as if they can “see through walls.” This is because 
the ray casting operation was replaced by a nearest neighbor function, which 
is incapable of determining whether a path to a point is intercepted by an 
obstacle in the map. And third, our approach does not take map uncertainty 
into account. In particular, it cannot handle unexplored areas, for which the 
map is highly uncertain or unspecified. 

The basic algorithm likelihood_field_range_finder_model can be ex- 
tended to diminish the effect of these limitations. For example, one might 
sort map occupancy values into three categories: occupied, free, and unknown, 
instead of just the first two. When a sensor measurement z7 falls into the 
category unknown, its probability p(z7 | z;, m) is assumed to be the constant 
value ae! The resulting probabilistic model is crude. It assumes that in the 
unexplored space every sensor measurement is equally likely. 

Figure 6.10 shows a map and the corresponding likelihood field. Here 
again the gray-level of an z-y-location indicates the likelihood of receiving a 
sensor reading there. The reader may notice that the distance to the nearest 
obstacle is only employed inside the map, which corresponds to the explored 
terrain. Outside, the likelihood p(z¥ | £+, m) is a constant. For computational 
efficiency, it is worthwhile to pre-compute the nearest neighbor for a fine- 
grained 2-D grid. 

Likelihood fields over the visible space can also be defined for the most 
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(a) sensor scan (b) likelihood field 
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Figure 6.11 (a) Sensor scan, from a bird's eye perspective. The robot is placed at the 
bottom of this figure, generating a proximity scan that consists of the 180 dots in front 
of the robot. (b) Likelihood function generated from this sensor scan. The darker a 
region, the smaller the likelihood for sensing an object there. Notice that occluded 
regions are white, hence infer no penalty. 


recent scan, which in fact defines a local map. Figure 6.11 shows such a 
likelihood field. It plays an important role in techniques that align individual 
scans. 


Correlation-Based Measurement Models 


There exist a number of range sensor models in the literature that measure 
correlations between a measurement and the map. A common technique is 
known as map matching. Map matching requires techniques discussed in later 
chapters of this book, namely the ability to transform scans into occupancy 
maps. Typically, map matching compiles small numbers of consecutive scans 
into local maps, denoted mis :4;. Figure 6.12 shows such a local map, here in 
the form of an occupancy grid map. The sensor measurement model com- 
pares the local map mioca1 to the global map m, such that the more similar 
m and Miocai, the larger p(miocai | £t, m). Since the local map is represented 
relative to the robot location, this comparison requires that the cells of the 
local map are transformed into the coordinate framework of the global map. 
Such a transformation can be done similar to the coordinate transform (6.32) 
of sensor measurements used in the likelihood field model. If the robot is 
at location z;, we denote by m;,,15c(z;) the grid cell in the local map that 
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Figure 6.12 Example of a local map generated from 10 range scans, one of which is 
shown. 


corresponds to (x y)? in global coordinates. Once both maps are in the same 
reference frame, they can be compared using the map correlation function, 
which is defined as follows: 


Žr y May mE m) E (Mae decal 53) > m) 


V 2s y May = M)? Dy yl Mey tocat (à) — M)? 


Here the sum is evaluated over cells defined in both maps, and m is the 
average map value: 








Pm,mioca,©t 


Fy tng + ) 
i Nae Mey Tli; yj local 
2N cp 
where N denotes the number of elements in the overlap between the local 
and global map. The correlation Pm,miocai;x; scales between +1. Map match- 


ing interprets the value 





p(miocai | Tt, m) = MaXx{Pm,micea,te» 0) 


as the probability of the local map conditioned on the global map m and the 
robot pose x+. If the local map is generated from a single range scan z+, this 
probability substitutes the measurement probability p(z; | zi, m). 

Map matching has a number of nice properties: just like the likelihood field 
model, it is easy to compute, though it does not yield smooth probabilities in 
the pose parameter z;. One way to approximate the likelihood field (and to 
obtain smoothness) is to convolve the map m with a Gaussian smoothness 
kernel, and to run map matching on this smoothed map. 


5ris.em. QOOOOO 


176 


6.6 


6.6.1 


FEATURES 


6 Robot Perception 


A key advantage of map matching over likelihood fields is that it explic- 
itly considers the free-space in the scoring of two maps; the likelihood field 
technique only considers the end point of the scans, which by definition cor- 
respond to occupied space (or noise). On the other hand, many mapping 
techniques build local maps beyond the reach of the sensors. For example, 
many techniques build circular maps around the robot, setting to 0.5 areas 
beyond the range of actual sensor measurements. In such cases, there is a 
danger that the result of map matching incorporates areas beyond the actual 
measurement range, as if the sensor can “see through walls.” Such side- 
effects are found in a number of implemented map matching techniques. 

A further disadvantage is that map matching does not possess a plausi- 
ble physical explanation. Correlations are the normalized quadratic distance 
between maps, which is not the noise characteristic of range sensors. 


Feature-Based Measurement Models 


Feature Extraction 


The sensor models discussed thus far are all based on raw sensor measure- 
ments. An alternative approach is to extract features from the measurements. 
If we denote the feature extractor as a function f, the features extracted from 
a range measurement are given by f(z). Most feature extractors extract a 
small number of features from high-dimensional sensor measurements. A 
key advantage of this approach is the enormous reduction of computational 
complexity: While inference in the high-dimensional measurement space can 
be costly, inference in the low-dimensional feature space can be orders of 
magnitude more efficient. 

The discussion of specific algorithms for feature extraction is beyond the 
scope of this book. The literature offers a wide range of features for a number 
of different sensors. For range sensors, it is common to identify lines, corners, 
or local minima in range scans, which correspond to walls, corners, or objects 
such as tree trunks. When cameras are used for navigation, the processing of 
camera images falls into the realm of computer vision. Computer vision has 
devised a myriad of feature extraction techniques from camera images. Pop- 
ular features include edges, corners, distinct patterns, and objects of distinct 
appearance. In robotics, it is also common to define places as features, such 
as hallways and intersections. 
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Landmark Measurements 


In many robotics applications, features correspond to distinct objects in the 
physical world. For example, in indoor environments features may be door 
posts or windowsills; outdoors they may correspond to tree trunks or corners 
of buildings. In robotics, it is common to call those physical objects landmarks, 
to indicate that they are being used for robot navigation. 

The most common model for processing landmarks assumes that the sen- 
sor can measure the range and the bearing of the landmark relative to the 
robot's local coordinate frame. Such sensors are called range and bearing 
sensors. The existence of a range-bearing sensor is not an implausible as- 
sumption: Any local feature extracted from range scans come with range and 
bearing information, as do visual features detected by stereo vision. In addi- 
tion, the feature extractor may generate a signature. In this book, we assume 
a signature is a numerical value (e.g., an average color); it may equally be 
an integer that characterizes the type of the observed landmark, or a multi- 
dimensional vector characterizing a landmark (e.g., height and color). 

If we denote the range by r, the bearing by ¢, and the signature by s, the 
feature vector is given by a collection of triplets 


H r? 
f) = dotes) = { i E H yere) 
si 5 


The number of features identified at each time step is variable. However, 
many probabilistic robotic algorithms assume conditional independence be- 
tween features 


p) pz m) = [ peri gisi lenm) 


Conditional independence applies if the noise in each individual measure- 
ment (ri $i st)? is independent of the noise in other measurements 
(r? dj si)" (fori Æ j). Under the conditional independence assumption, 
we can process one feature at a time, just as we did in several of our range 
measurement models. This makes it much easier to develop algorithms that 
implement probabilistic measurement models. 

Let us now devise a sensor model for features. In the beginning of this 
chapter, we distinguished between two types of maps: feature-based and 
location-based. Landmark measurement models are usually defined only for 
feature-based maps. The reader may recall that those maps consist of lists 
of features, m = (mi, m»,...). Each feature may possess a signature and a 
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location coordinate. The location of a feature, denoted mj; and m; y, is simply 
its coordinate in the global coordinate frame of the map. 

The measurement vector for a noise-free landmark sensor is easily speci- 
fied by the standard geometric laws. We will model noise in landmark per- 
ception by independent Gaussian noise on the range, bearing, and the signa- 
ture. The resulting measurement model is formulated for the case where the 
i-th feature at time t corresponds to the j-th landmark in the map. As usual, 
the robot pose is given by x; = (x y 0)T. 





ri V (Mj — 2)? + (mj, — y)? Eo? 
i = atan2(Mjy —y,mjs —z)—0 |+| E 
St 5j £g2 


Here €,,, Ecg, and €,, are zero-mean Gaussian error variables with standard 
deviations or, 7g, and os, respectively. 


Sensor Model with Known Correspondence 


A key problem for range/bearing sensors is known as the data association 
problem. This problem arises when landmarks cannot be uniquely identified, 
so that some residual uncertainty exists with regards to the identity of a land- 
mark. For developing a range/bearing sensor model, it shall prove useful to 
introduce a correspondence variable between the feature f? and the landmark 
m; in the map. This variable will be denoted by cj with cj € {1,...,N +1}; 
N is the number of landmarks in the map m. If e = j < N , then the i-th 
feature observed at time t corresponds to the j-th landmark in the map. In 
other words, cj is the true identity of an observed feature. The only excep- 
tion occurs with ci = N + 1: Here a feature observation does not correspond 
to any feature in the map m. This case is important for handling spurious 
landmarks; it is also of great relevance for the topic of robotic mapping, in 
which the robot may encounter previously unobserved landmarks. 

Table 6.4 depicts the algorithm for calculating the probability of a feature 
fi with known correspondence c < N. Lines 3 and 4 calculate the true 
range and bearing to the landmark. The probability of the measured ranges 
and bearing is then calculated in line 5, assuming independence in the noise. 
As the reader easily verifies, this algorithm implements Equation (6.40). 
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1: Algorithm landmark model known correspondence(//, c, £+, m): 
2: ge 

3: f = /(mj,-—z) + (Mjy — y)? 

4: $ = atan2(m;,, — Y, mj, — T) 

5: q = prob(r! — 7, 0,) - prob(ói — $, ag): prob(si — sj, os) 

6: return q 











Table 6.4 Algorithm for computing the likelihood of a landmark measurement. The 
algorithm requires as input an observed feature ff = (rj ¢ si)", and the true iden- 
tity of the feature cj, the robot pose a = (x y 0)”, and the map m. Its output is the 
numerical probability p(f? | c, m, x+). 


Sampling Poses 


Sometimes it is desirable to sample robot poses x; that correspond to a mea- 
surement ff with feature identity cj. We already encountered such sampling 
algorithms in the previous chapter, where we discussed robot motion mod- 
els. Such sampling models are also desirable for sensor models. For example, 
when localizing a robot globally, it shall become useful to generate sample 
poses that incorporate a sensor measurement to generate initial guesses for 
the robot pose. 

While in the general case, sampling poses z; that correspond to a sensor 
measurement 2; is difficult, for our landmark model we can actually provide 
an efficient sampling algorithm. However, such sampling is only possible 
under further assumptions. In particular, we have to know the prior p(z; | 
ci, m). For simplicity, let us assume this prior is uniform (it generally is not!). 
Bayes rule then suggests that 


p(x [een = n p(fi | ci, £t, m) plar | ci, m) 
= np(fi | Tem) 


Sampling from p(x; | ft, ci, m) can now be achieved from the "inverse" of the 
sensor model p( f} | c, zi, m). Table 6.5 depicts an algorithm that samples 
poses x. The algorithm is tricky: Even in the noise-free case, a landmark 
observation does not uniquely determine the location of the robot. Instead, 
the robot may be on a circle around the landmark, whose diameter is the 
range to the landmark. The indeterminacy of the robot pose also follows 
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1 Algorithm sample_landmark_model_known_correspondence( fý, cj, m): 
2: j-d 

3: ^ = rand(0, 27) 

4: f = ri sample(o;.) 

5: $ó-di- sample(o;) 

6: £ 一 jz fcos^ 

7: Y = Mj y + fsin^ 

8: 6=4-n-o 

9: return (x y 0)" 








Table 6.5 Algorithm for sampling poses from a landmark measurement fi = 
(ri di si)" with known identity cj. 


from the fact that the range and bearing provide two constraints in a three- 
dimensional space of robot poses. 

To implement a pose sampler, we have to sample the remaining free pa- 
rameter, which determines where on the circle around the landmark the 
robot is located. This parameter is called 了 in Table 6.5, and is chosen at ran- 
dom in line3. Lines 4 and 5 perturb the measured range and bearing, exploit- 
ing the fact that the mean and the measurement are treated symmetrically in 
Gaussians. Finally, lines 6 through 8 recover the pose that corresponds to 5, 
f, and 9. 

Figure 6.13 illustrates the pose distribution p(x; |  fi,ci,m) (left 
diagram) and also shows a sample drawn with our algorithm sam- 
ple landmark model known correspondence (right diagram). The poste- 
rior is projected into x-y-space, where it becomes a ring around the measured 
range ri. In 3-D pose space, it is a spiral that unfolds the ring with the angle 
0. 





Further Considerations 


Both algorithms for landmark-based measurements assume known corre- 
spondence. The case of unknown correspondence will be discussed in detail 
in later chapters, when we address algorithms for localization and mapping 
under unknown correspondence. 
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Figure6.13 Landmark detection model: (a) Posterior distribution of the robot's pose 
given that it detected a landmark in 5m distance and 30deg relative bearing (projected 
onto 2-D). (b) Sample robot poses generated from such a detection. The lines indicate 
the orientation of the poses. 


A comment is in order on the topic of landmark signature. Most pub- 
lished algorithms do not make the use of appearance features explicit. When 
the signature is not provided, all landmarks look equal, and the data asso- 
ciation problem of estimating the correspondence variables is harder. We 
have included the signature in our model because it is a valuable source of 
information that can often be easily extracted from the sensor measurements. 

As noted above, the main motivation for using features instead of the full 
measurement vector is computational in nature: It is much easier to manage 
a few hundred features than a few billion range measurements. Our model 
presented here is extremely crude, and it clearly does not capture the physi- 
cal laws that underlie the sensor formation process. Nevertheless, the model 
tends to work well in a great number of applications. 

It is important to notice that the reduction of measurements into features 
comes at a price. In the robotics literature, features are often (mis)taken for 
sufficient statistics of the measurement vector z;. Let X bea variable of interest 
(e.g., a map, a pose), and Y some other information that we might bring to 
bear (e.g., past sensor measurements). Then f is a sufficient statistics of z, if 


P(X | f(z), Y) 


In practice, however, a lot of information is sacrificed by using features in- 
stead of the full measurement vector. This lost information makes certain 


AX |Y) = 
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problems more difficult, such as the data association problem of determin- 
ing whether or not the robot just revisited a previously explored location. It 
is easy to understand the effects of feature extraction by introspection: When 
you open your eyes, the visual image of your environment is probably suf- 
ficient to tell you unambiguously where you are—even if you were globally 
uncertain before. If, on the other hand, you only sense certain features, such 
as the relative location of door posts and windowsills, you would probably 
be much less certain as to where you are. Quite likely the information may 
be insufficient for global localization. 

With the advent of fast computers, features have gradually lost importance 
in the field of robotics. Especially when using range sensors, most state-of- 
the-art algorithms rely on dense measurement vectors, and they use dense 
location-based maps to represent the environment. Nevertheless, features 
are still great for educational purposes. They enable us to introduce the ba- 
sic concepts in probabilistic robotics, and with proper treatment of problems 
such as the correspondence problem they can be brought to bear even in 
cases where maps are composed of dense sets of scan points. For this reason, 
a number of algorithms in this book are first described for feature represen- 
tations, and then extended into algorithms using raw sensor measurements. 


Practical Considerations 


This section surveyed a range of measurement models. We placed a strong 
emphasis on models for range finders, due to their great importance in 
robotics. However, the models discussed here are only representatives of 
a much broader class of probabilistic models. In choosing the right model, it 
is important to trade off physical realism with properties that might be de- 
sirable for an algorithm using these models. For example, we noted that a 
physically realistic model of range sensors may yield probabilities that are 
not smooth in the alleged robot pose—which in turn causes problems for al- 
gorithms such as particle filters. Physical realism is therefore not the only 
criterion in choosing the right sensor model; an equally important criterion 
is the goodness of a model for the algorithm that utilizes it. 

As a general rule of thumb, the more accurate a model, the better. In par- 
ticular, the more information we can extract from a sensor measurement, the 
better. Feature-based models extract relatively little information, by virtue 
of the fact that feature extractors project high-dimensional sensor measure- 
ments into lower dimensional space. As a result, feature-based methods tend 
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to produce inferior results. This disadvantage is offset by superior computa- 
tional properties of feature-based representations. 

When adjusting the intrinsic parameters of a measurement model, it is of- 
ten useful to artificially inflate the uncertainty. This is because of a key limita- 
tion of the probabilistic approach: to make probabilistic techniques computa- 
tionally tractable, we have to ignore dependencies that exist in the physical 
world, along with a myriad of latent variables that cause these dependen- 
cies. When such dependencies are not modeled, algorithms that integrate 
evidence from multiple measurements quickly become overconfident. Such 
overconfidence can ultimately lead to wrong conclusions, which negatively af- 
fects the results. In practice, it is therefore a good rule of thumb to reduce the 
information conveyed by a sensor. Doing so by projecting the measurement 
into a low-dimensional feature space is one way of achieving this. However, 
it suffers the limitations mentioned above. Uniformly decaying the infor- 
mation by exponentiating a measurement model with a parameter a, as dis- 
cussed in Chapter 6.3.4, is a much better way, in that it does not introduce 
additional variance in the outcome of a probabilistic algorithm. 


Summary 


This section described probabilistic measurement models. 


。 Starting with models for range finders—and lasers in particular—we dis- 
cussed measurement models p(z}’ | x+, m). The first such model used ray 
casting to determine the shape of p(z7 | z;, m) for particular maps m and 
poses x+. We devised a mixture model that addressed the various types of 
noise that can affect range measurements. 


* We devised a maximum likelihood technique for identifying the intrin- 
sic noise parameters of the measurement model. Since the measurement 
model is a mixture model, we provided an iterative procedure for max- 
imum likelihood estimation. Our approach was an instance of the ex- 
pectation maximization algorithm, which alternates a phase that calcu- 
lates expectations over the type of error underlying a measurement, with 
a maximization phase that finds in closed form the best set of intrinsic 
parameters relative to these expectations. 


* An alternative measurement model for range finders is based on likeli- 
hood fields. This technique used the nearest distance in 2-D coordinates 
to model the probability p(z7 | z;, m). We noted that this approach tends 
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to yield smoother distributions p(z¥ | z;, m). This comes at the expense of 
undesired side effects: The likelihood field technique ignores information 
pertaining to free-space, and it fails to consider occlusions in the interpre- 
tation of range measurements. 


* A third measurement model is based on map matching. Map matching 
maps sensor scans into local maps, and correlates those maps with global 
maps. This approach lacks a physical motivation, but can be implemented 
very efficiently. 


* We discussed how pre-computation can reduce the computational bur- 
den at runtime. In the beam-based measurement model, the pre- 
computation takes place in 3-D; the likelihood field requires only a 2-D 
pre-computation. 


* We presented a feature-based sensor model, in which the robot extracts 
the range, bearing, and signature of nearby landmarks. Feature-based 
techniques extract from the raw sensor measurements distinct features. 
In doing so, they reduce the dimensionality of the sensor measurements 
by several orders of magnitude. 


* At the end of the chapter, a discussion on practical issues pointed out 
some of the pitfalls that may arise in concrete implementations. 


Bibliographical Remarks 


This chapter only skims the rich literature on physical modeling of sensors. More accurate 
models of sonar range sensor can be found in Blahut et al. (1991); Grunbaum et al. (1992) and in 
Etter (1996). Models of laser range finders are described by Rees (2001). An empirical discussion 
of proper noise models can be found in Sahin et al. (1998). Relative to these models, the models 
in this chapter are extremely crude. 

An early work on beam models for range sensors can be found in the seminal work by 
Moravec (1988). A similar model was later applied to mobile robot localization by Burgard et al. 
(1996). A beam-based model like the one described in this chapter together with the pre-caching 
of range measurements has first been described by Fox et al. (1999b). The likelihood fields have 
first been published by Thrun (2001), although they are closely related to the rich literature on 
scan matching techniques (Besl and McKay 1992). They in fact can be regarded as a soft variant 
of the correlation model described by Konolige and Chou (1999). Methods for computing the 
correlation between occupancy grid maps have also been quite popular. Thrun (1993) computes 
the sum of squared errors between the individual cells of two grid maps. Schiele and Crow- 
ley (1994) present a comparison of different models including correlation-based approaches. 
Yamauchi and Langley (1997) analyzed the robustness of map matching to dynamic environ- 
ments. Duckett and Nehmzow (2001) transform local occupancy grids into histograms that can 
be matched more efficiently. 
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Range and bearings measurements for point landmarks are commonplace in the SLAM lit- 
erature. Possibly the first mention is by Leonard and Durrant-Whyte (1991). In earlier work, 
Crowley (1989) devised measurement models for straight line objects. 


Exercises 


1. Many early robots navigating using features used artificial landmarks in 
the environment that were easy to recognize. A good place to mount such 
markers is a ceiling (why?). A classical example is a visual marker: Sup- 
pose we attach the following marker to the ceiling: 


Let the world coordinates of the marker be £m and ym, and its orientation 
relative to the global coordinate system 0,,. We will denote the robot's 
pose by £r, yr, and 6,. 


Now assume that we are given a routine that can detect the marker in 
the image plane of a perspective camera. Let x; and y; denote the coor- 
dinates of the marker in the image plane, and 6; its angular orientation. 
The camera has a focal length of f. From projective geometry, we know 
that each displacement d in z-y-space gets projected to a proportional dis- 
placement of d- £ in the image plane. (You have to make some choices on 
your coordinate systems; make these choices explicit). 


Your questions: 


(a) Describe mathematically where to expect the marker (in global coor- 
dinates £m, Ym, m) when its image coordinates are z;, yi, 0;, and the 
robot is at £r, Yr, Or. 

(b) Provide a mathematical equation for computing the image coordi- 
nates z;, yi, 0; from the robot pose £r, Yr, 6, and the marker coordinates 
Tm, Ym, Om 

(c) Now give a mathematical equation for determining the robot co- 


ordinates £r, Yr, 0r assuming we know the true marker coordinates 
Cm; Ym, Om and the image coordinates z;, yi, 0j. 
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(d) So far we assumed there is only a single marker. Now suppose there 
are multiple (indistinguishable) markers of the type shown above. 
How many such markers must a robot be able to see to uniquely iden- 
tify its pose? Draw such a configuration, and argue why it is sufficient. 


Hint: You don’t need to consider the uncertainty in the measurement for 
answering this question. Also, note that the marker is symmetrical. This 
has an impact on the answer of these questions! 


2. In this exercise, you will be asked to extend our calculation in the previ- 
ous exercise to include error covariances. To simplify the calculation, we 
now assume a non-symmetric marker whose absolute orientation can be 
estimated: 


al. 





Also for simplicity, we assume there shall be no noise in the orientation. 
However, the x-y estimates in the image plane will be noisy. Specifically, 
let the measurements be subject to zero-mean Gaussian noise with covari- 


ance 
c? 0 0 
2j" = 0 c? 0 
0 0 0 


for some positive value of o°. 

Calculate for the three questions above the corresponding covariances. In 

particular, 

(a) Given the image coordinates 2;,yi,0; and the robot coordinates 
Ly, Yr, Or, What is the error covariance for the values of £m, Ym, 05. 


(b) Given the robot coordinates Zr,yr,0 and the marker covariance 
Em, Ym, Om, What is your error covariance for the values of Zi yi, 0;? 


(c) Given the marker covariance %m,¥Ym,0m and the image coordinates 
i, yi, 0j, what is your error covariance for the values of £r, Yr, 0,.? 
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Notice that not all those distributions may be Gaussian. For this exercise, 
it is fine to apply a Taylor series expansion to attain a Gaussian posterior, 
but you have to explain how you did this. 


3. Now you are being asked to implement a routine sample marker model, 
which accepts as an input the location of a marker £m, Ym, 0m and the lo- 
cation of the perceived marker in the image plane z;, yi, 0;, and generates 
as an output samples of the robot pose 2, yr, 0... The marker is the same 
ambiguous marker as in Exercise 1: 





Generate a plot of samples for the robot coordinates Zr and y,, for the 
following parameters (you can ignore the orientation 0, in your plot). 




















problem # | £m Um Om Ti Yi 6i h/f o? 
#1 Ocm Ocm 0° | Oem Oem 0° | 200 | O.1em? 
#2 Ocm Ocm 0° | lem Oem O0? | 200 | O.1em? 
#3 Ocm Ocm 0° | 2em Oem 45° | 200 | 0.1em? 
#4 Ocm Ocm 0° | 2em Oem 45° | 200 | 1.0cm? 
#5 50cm 150cm 10° | lem 6cm 200? | 250 | 0.5em? 























All your plots should show coordinate axes with units. Notice: If you 
cannot devise an exact sampler, provide an approximate one and explain 
your approximations. 


4. For this exercise you need access to a robot with a sonar sensor, of the type 
often used in indoor robotics. Place the sensor in front of a flat wall, at a 
range d and an angle ¢. Measure the frequency at which the sensor detects 
the wall. Plot this value for different values of d (in 0.5 meter increments) 
and different values of ¢ (in 5 degree increments). What do you find? 
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Mobile Robot Localization: 
Markov and Gaussian 


This chapter introduces a number of concrete algorithms for mobile robot 
localization. Mobile robot localization is the problem of determining the pose 
of a robot relative to a given map of the environment. It is often called position 
estimation. Mobile robot localization is an instance of the general localization 
problem, which is the most basic perceptual problem in robotics. Nearly 
all robotics tasks require knowledge of the location of objects that are being 
manipulated. The techniques described in this and subsequent chapter are 
equally applicable for object localization tasks. 

Figure 7.1 depicts a graphical model for the mobile robot localization prob- 
lem. The robot is given a map of its environment and its goal is to determine 
its position relative to this map given the perceptions of the environment and 
its movements. 

Mobile robot localization can be seen as a problem of coordinate transfor- 
mation. Maps are described in a global coordinate system, which is indepen- 
dent of a robot’s pose. Localization is the process of establishing correspon- 
dence between the map coordinate system and the robot’s local coordinate 
system. Knowing this coordinate transformation enables the robot to express 
the location of objects of interest within its own coordinate frame—a neces- 
sary prerequisite for robot navigation. As the reader easily verifies, knowing 
the pose z, = (x y 0)" of the robot is sufficient to determine this coordinate 
transformation, assuming that the pose is expressed in the same coordinate 
frame as the map. 

Unfortunately—and herein lies the problem of mobile robot localization— 
the pose can usually not be sensed directly. Put differently, most robots do 
not possess a noise-free sensor for measuring pose. The pose therefore has 
to be inferred from data. A key difficulty arises from the fact that a single 
sensor measurement is usually insufficient to determine the pose. Instead, 
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fa) feu) 
e) 1 Q0 10) 
© © 


Figure 7.1 Graphical model of mobile robot localization. The value of shaded nodes 
are known: the map m, the measurements z, and the controls u. The goal of localiza- 
tion is to infer the robot pose variables x. 


the robot has to integrate data over time to determine its pose. To see why 
this is necessary, just picture a robot located inside a building where many 
corridors look alike. Here a single sensor measurement (e.g., a range scan) is 
usually insufficient to identify the specific corridor. 

Localization techniques have been developed for a broad set of map rep- 
resentations. We already discussed two types of maps: feature-based and 
location-based. An example of the latter was occupancy grid maps, which are 
subject to a later chapter in this book. Some other types of maps are shown 
in Figure 7.2. This figure shows a hand-drawn metric 2-D map, a graph-like 
topological map, an occupancy grid map, and an image mosaic of a ceiling 
(which can also be used as a map). Later chapters will investigate specific 
map types and discuss algorithms for acquiring maps from data. Localiza- 
tion assumes that an accurate map is available. 

In this and the subsequent chapter, we present some basic probabilistic 
algorithms for mobile localization. All of these algorithms are variants of 
the basic Bayes filter described in Chapter 2. We discuss the advantages and 
shortcomings of each representation and associated algorithms. The chapter 
also goes through a series of extensions that address different localization 
problems, as defined through the following taxonomy. 
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Figure 7.2 Example maps used for robot localization: (a) a manually constructed 
2-D metric layout, (b) a graph-like topological map, (c) an occupancy grid map, and 


(d) an image mosaic of a ceiling. (d) courtesy of Frank Dellaert, Georgia Institute of 
Technology. 


7.1 A Taxonomy of Localization Problems 


Not every localization problem is equally hard. To understand the difficulty 
of a localization problem, let us first briefly discuss a taxonomy of localiza- 
tion problems. This taxonomy divides localization problems along a number 
of important dimensions pertaining to the nature of the environment and 


the initial knowledge that a robot may possess relative to the localization 
problem. 


Local Versus Global Localization Localization problems are characterized 
by the type of knowledge that is available initially and at run-time. We dis- 


tinguish three types of localization problems with an increasing degree of 
difficulty. 


POSITION TRACKING Position tracking assumes that the initial robot pose is known. Localizing 


the robot can be achieved by accommodating the noise in robot motion. The 
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effect of such noise is usually small. Hence, methods for position tracking of- 
ten rely on the assumption that the pose error is small. The pose uncertainty 
is often approximated by a unimodal distribution (e.g., a Gaussian). The po- 
sition tracking problem is a local problem, since the uncertainty is local and 
confined to region near the robot's true pose. 

In global localization, the initial pose of the robot is unknown. The robot 
is initially placed somewhere in its environment, but it lacks knowledge of 
its whereabouts. Approaches to global localization cannot assume bounded- 
ness of the pose error. As we shall see later in this chapter, unimodal prob- 
ability distributions are usually inappropriate. Global localization is more 
difficult than position tracking; in fact, it subsumes the position tracking 
problem. 

The kidnapped robot problem is a variant of the global localization problem, 
but one that is even more difficult. During operation, the robot can get kid- 
napped and teleported to some other location. The kidnapped robot problem 
is more difficult than the global localization problems, in that the robot might 
believe it knows where it is while it does not. In global localization, the robot 
knows that it does not know where it is. One might argue that robots are 
rarely kidnapped in practice. The practical importance of this problem, how- 
ever, arises from the observation that most state-of-the-art localization algo- 
rithms cannot be guaranteed never to fail. The ability to recover from failures 
is essential for truly autonomous robots. Testing a localization algorithm by 
kidnapping it measures its ability to recover from global localization failures. 


Static Versus Dynamic Environments A second dimension that has a sub- 
stantial impact on the difficulty of localization is the environment. Environ- 
ments can be static or dynamic. 

Static environments are environments where the only variable quantity 
(state) is the robot’s pose. Put differently, only the robot moves in static envi- 
ronment. All other objects in the environments remain at the same location 
forever. Static environments have some nice mathematical properties that 
make them amenable to efficient probabilistic estimation. 

Dynamic environments possess objects other than the robot whose loca- 
tion or configuration changes over time. Of particular interest are changes 
that persist over time, and that impact more than a single sensor reading. 
Changes that are not measurable are of course of no relevance to localiza- 
tion, and those that affect only a single measurement are best treated as noise 
(cf. Chapter 2.4.4). Examples of more persistent changes are: people, daylight 
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(for robots equipped with cameras), movable furniture, or doors. Clearly, 
most real environments are dynamic, with state changes occurring at a range 
of different speeds. 

Obviously, localization in dynamic environments is more difficult than 
localization in static ones. There are two principal approaches for accom- 
modating dynamics: First, dynamic entities might be included in the state 
vector. As a result, the Markov assumption might now be justified, but such 
an approach carries the burden of additional computational and modeling 
complexity. Second, in certain situations sensor data can be filtered so as to 
eliminate the damaging effect of unmodeled dynamics. Such an approach is 
described further below in Chapter 8.4. 


Passive Versus Active Approaches A third dimension that characterizes 
different localization problems pertains to the fact whether or not the local- 
ization algorithm controls the motion of the robot. We distinguish two cases: 

In passive localization, the localization module only observes the robot op- 
erating. The robot is controlled through some other means, and the robot's 
motion is not aimed at facilitating localization. For example, the robot might 
move randomly or perform its everyday tasks. 

Active localization algorithms control the robot so as to minimize the local- 
ization error and/or the costs arising from moving a poorly localized robot 
into a hazardous place. 

Active approaches to localization usually yield better localization results 
than passive ones. We already discussed an example in the introduction to 
this book: coastal navigation. A second example situation is shown in Fig- 
ure 7.3. Here the robot is located in a symmetric corridor, and its belief after 
navigating the corridor for a while is centered at two (symmetric) poses. The 
local symmetry of the environment makes it impossible to localize the robot 
while in the corridor. Only if it moves into a room will it be able to elimi- 
nate the ambiguity and to determine its pose. It is situations like these where 
active localization gives much better results: Instead of merely waiting until 
the robot accidentally moves into a room, active localization can recognize 
the impasse and escape from it. 

However, a key limitation of active approaches is that they require con- 
trol over the robot. Thus, in practice, an active localization technique alone 
tends to be insufficient: The robot has to be able to localize itself even when 
carrying out some other task than localization. Some active localization tech- 
niques are built on top of a passive technique. Others combine task perfor- 
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Figure 7.3 Example situation that shows a typical belief state during global localiza- 
tion in a locally symmetric environment. The robot has to move into one of the rooms 
to determine its location. 


mance goals with localization goals when controlling a robot. 
This chapter exclusively considers passive localization algorithms. Active 
localization will be discussed in Chapter 17. 


Single-Robot Versus Multi-Robot A fourth dimension of the localization 
problem is related to the number of robots involved. 

Single-robot localization is the most commonly studied approach to local- 
ization. It deals with a single robot only. Single robot localization offers the 
convenience that all data is collected at a single robot platform, and there is 
no communication issue. 

The multi-robot localization problem arises in teams of robots. At first 
glance, each robot could localize itself individually, hence the multi-robot lo- 
calization problem can be solved through single-robot localization. If robots 
are able to detect each other, however, there is the opportunity to do better. 
This is because one robot's belief can be used to bias another robot's belief 
if knowledge of the relative location of both robots is available. The issue of 
multi-robot localization raises interesting, non-trivial issues on the represen- 
tation of beliefs and the nature of the communication between them. 

These four dimensions capture the four most important characteristics of 
the mobile robot localization problem. There exist a number of other charac- 
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Algorithm Markov_localization(bel (x, 1), we, z+, m): 
for all x, do 
bel(z;) = J plai | us, a3, m) pel(z 1) dzi li 
bel(z,) = N p(% | £t, m) bel(z;) 
endfor 


return bel(x;) 











Table 7.1 Markov localization. 


terizations that impact the hardness of the problem, such as the information 
provided by robot measurements and the information lost through motion. 
Also, symmetric environments are more difficult than asymmetric ones, due 
to the higher degree of ambiguity. 


Markov Localization 


Probabilistic localization algorithms are variants of the Bayes filter. The 
straightforward application of Bayes filters to the localization problem is 
called Markov localization. Table 7.1 depicts the basic algorithm. This algo- 
rithm is derived from the algorithm Bayes filter (Table 2.1 on page 27). No- 
tice that Markov localization also requires a map m as input. The map plays 
a role in the measurement model p(z; | x+, m) (line 4). It often, but not always, 
is incorporated in the motion model p(x; | ut, x+—-1, m) as well (line 3). Just 
like the Bayes filter, Markov localization transforms a probabilistic belief at 
time t — 1 into a belief at time t. Markov localization addresses the global lo- 
calization problem, the position tracking problem, and the kidnapped robot 
problem in static environments. 

The initial belief, bel(xo), reflects the initial knowledge of the robot's pose. 
It is set differently depending on the type of localization problem. 


* Position tracking. If the initial pose is known, bel(zo) is initialized by a 


point-mass distribution. Let zo denote the (known) initial pose. Then 


1 if Xo = To 
0 otherwise 


bel(zo) = { 
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Figure 7.4 Example environment used to illustrate mobile robot localization: One- 
dimensional hallway environment with three indistinguishable doors. Initially the 
robot does not know its location except for its heading direction. Its goal is to find 
out where it is. 


Point-mass distributions are discrete and therefore do not possess a den- 
sity. 


In practice the initial pose is often just known in approximation. The be- 
lief bel(zo) is then usually initialized by a narrow Gaussian distribution 
centered around zo. Gaussians were defined in Equation (2.4) on page 15. 


1 
bel(zo) = det (2r2) 2 exp([-i(zo — Zo)’ X !(xo — Zo) } 
e 
~N (zo;£o,X) 


X is the covariance of the initial pose uncertainty. 


e Global localization. If the initial pose is unknown, bel(xo) is initialized 
by a uniform distribution over the space of all legal poses in the map: 


1 
bel = X 
SACS 
where |X| stands for the volume (Lebesgue measure) of the space of all 
poses within the map. 


* Other. Partial knowledge of the robot's position can usually easily be 
transformed into an appropriate initial distribution. For example, if the 
robot is known to start next to a door, one might initialize bel(zo) using 
a density that is zero except for places near doors, where it may be uni- 
form. If it is known to be located in a specific corridor, one might initialize 
bel(xo) by a uniform distribution in the area of the corridor and zero any- 
where else. 
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(a) 


(b) 


(c) 


(d) 


(e) 


Figure 7.5 Illustration of the Markov localization algorithm. Each picture depicts 
the position of the robot in the hallway and its current belief bel(x). (b) and (d) ad- 


ditionally depict the observation model p(z: | x+), which describes the probability of 


observing a door at the different locations in the hallway. 
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Illustration of Markov Localization 


We have already discussed Markov localization in the introduction to this 
book, as a motivating example for probabilistic robotics. Now we can back 
up this example using a concrete mathematical framework. Figure 7.4 de- 
picts our one-dimensional hallway with three identical doors. The initial 
belief bel(xo) is uniform over all poses, as illustrated by the uniform density 
in Figure 7.5a. As the robot queries its sensors and notices that it is adja- 
cent to one of the doors, it multiplies its belief bel(zxo) by p(z; | xi, m), as 
stated in line 4 of our algorithm. The upper density in Figure 7.5b visual- 
izes p(z | x+, m) for the hallway example. The lower density is the result of 
multiplying this density into the robot's uniform prior belief. Again, the re- 
sulting belief is multi-modal, reflecting the residual uncertainty of the robot 
at this point. 

As the robot moves to the right, indicated in Figure 7.5c, line 3 of the 
Markov localization algorithm convolves its belief with the motion model 
plx: | Ut, 24-1). The motion model p(z, | uz, 24-1) is not focused on a sin- 
gle pose but on a whole continuum of poses centered around the expected 
outcome of a noise-free motion. The effect is visualized in Figure 7.5c, which 
shows a shifted belief that is also flattened out, as a result of the convolution. 

The final measurement is illustrated in Figure 7.5d. Here the Markov lo- 
calization algorithm multiplies the current belief with the perceptual prob- 
ability p(z; | x+). At this point, most of the probability mass is focused on 
the correct pose, and the robot is quite confident of having localized itself. 
Figure 7.5e illustrates the robot's belief after having moved further down the 
hallway. 

We already noted that Markov localization is independent of the under- 
lying representation of the state space. In fact, Markov localization can be 
implemented using any of the representations discussed in Chapter 2. We 
now consider three different representations and devise practical algorithms 
that can localize mobile robots in real time. We begin with Kalman filters, 
which represent beliefs by their first and second moment. We then continue 
with discrete, grid representations and finally introduce algorithms using 
particle filters. 
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EKF Localization 


The extended Kalman filter localization algorithm, or EKF localization, is a spe- 
cial case of Markov localization. EKF localization represents beliefs bel(z;) 
by their first and second moment, the mean ji; and the covariance X,. The 
basic EKF algorithm was stated in Table 3.3 in Chapter 3.3 (page 59). EKF lo- 
calization shall be our first concrete implementation of an EKF in the context 
of an actual robotics problem. 

Our EKF localization algorithm assumes that the map is represented by 
a collection of features. At any point in time t, the robot gets to observe a 
vector of ranges and bearings to nearby features: z; = {z}, 27,...}. We begin 
with a localization algorithm in which all features are uniquely identifiable. 
The existence of uniquely identifiable features may not be a bad assumption: 
For example, the Eiffel Tower in Paris is a landmark that is rarely confused 
with other landmarks, and it is widely visible throughout Paris. The identity 
of a feature is expressed by a set of correspondence variables, denoted ci, one 
for each feature vector zt. Correspondence variables were already discussed 
in Chapter 6.6. Let us first assume that the correspondences are known. We 
then progress to a more general version that allows for ambiguity among 
features. The second, more general version applies a maximum likelihood 
estimator to estimate the value of the latent correspondence variable, and 
uses the result of this estimation as if it were ground truth. 


Illustration 


Figure 7.6 illustrates the EKF localization algorithm using our example 
of mobile robot localization in the one-dimensional corridor environment 
(cf. Figure 7.4). To accommodate the unimodal shape of the belief in EKFs, 
we make two convenient assumptions: First, we assume that the correspon- 
dences are known. We attach unique labels to each door (1, 2, and 3), and 
we denote the measurement model by p(z; | zi, m, ct), where m is the map 
and c; € {1,2,3} is the identity of the door observed at time t. Second, we 
assume that the initial pose is relatively well known. A typical initial belief 
is represented by the Gaussian distribution shown in Figure 7.6a, centered 
on the area near Door 1 and with a Gaussian uncertainty as indicated in that 
figure. As the robot moves to the right, its belief is convolved with the mo- 
tion model. The resulting belief is a shifted Gaussian of increased width, as 
shown in Figure 7.6b. 

Now suppose the robot detects that it is in front of door c; — 2. The upper 
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Figure 7.6 Application of the Kalman filter algorithm to mobile robot localization. 
All densities are represented by unimodal Gaussians. 


density in Figure 7.6c visualizes p(z; | £4, m, c¢) for this observation—again a 
Gaussian. Folding this measurement probability into the robot's belief yields 
the posterior shown in Figure 7.6c. Note that the variance of the resulting 
belief is smaller than the variances of both the robot's previous belief and 
the observation density. This is natural, since integrating two independent 
estimates should make the robot more certain than each estimate in isola- 
tion. After moving down the hallway, the robot's uncertainty in its position 
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increases again, since the EKF continues to incorporate motion uncertainty 
into the robot's belief. Figure 7.6d shows one of these beliefs. This example 
illustrates the Kalman filter in our limited setting. 


The EKF Localization Algorithm 


The discussion thus far has been fairly abstract: We have silently assumed the 
availability of an appropriate motion and measurement model, and have left 
unspecified a number of key variables in the EKF update. We now discuss 
a concrete implementation of the EKF for feature-based maps. Our feature- 
based maps consist of point landmarks, as already discussed in Chapter 6.2. 
For such point landmarks, we use the common measurement model dis- 
cussed in Chapter 6.6. We also adopt the velocity motion model defined 
in Chapter 5.3. The reader may take a moment to briefly reacquire the ba- 
sic measurement and motion equations discussed in these chapters before 
reading on. 

Table 7.2 describes EKF localization known correspondences, the EKF 
algorithm for localization with known correspondences. This algorithm is 
derived from the EKF in Table 3.3 in Chapter 3. It requires as its input a 
Gaussian estimate of the robot pose at time t — 1, with mean u+- and co- 
variance X,. . Further, it requires a control u+, a map m, and a set of features 
ze = (21, 22,...] measured at time t, along with the correspondence variables 
c, = (el cd... ). Its output is a new, revised estimate u, X+, along with the 
likelihood of the feature observation, p,,. The algorithm does not handle the 
case of straight motion for which w, = 0. The treatment of this special case is 
left as an exercise. 

The individual calculations in this algorithm are explained further below. 
Lines 3 and 4 compute the Jacobians needed for the linearized motion model. 
Line 5 determines the motion noise covariance matrix from the control. Lines 
6 and 7 implement the familiar motion update. The predicted pose after the 
motion is calculated as j; in line 6, and line 7 computes the corresponding 
uncertainty ellipse. The measurement update (correction step) is realized 
through Lines 8 to 21. The core of this update is a loop through all features i 
Observed at time t. In line 10, the algorithm assigns to j the correspondence 
of the i-th feature in the measurement vector. It then calculates a predicted 
measurement 2 and the Jacobian H? of the measurement model. Using this 
Jacobian, the algorithm determines 57, the uncertainty corresponding to the 
predicted measurement 2;. The Kalman gain Kj is then calculated in line 
15. The estimate is updated in lines 16 and 17, once for each feature. Lines 
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2: 





1: Algorithm EKF_localization_known_correspondences(ji;—1, 31-1, Ut, Zt, Ct, m): 
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endfor 
bt = pu 
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pz, = [[; det (2551) 
return pi, Ut, pz, 


exp[-3 Gi — &)"[S;] Gi — 20] 








Table7.2 The extended Kalman filter (EKF) localization algorithm, formulated here 
for a feature-based map and a robot equipped with sensors for measuring range and 
bearing. This version assumes knowledge of the exact correspondences. 
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19 and 20 set the new pose estimate, followed by the computation of the 
measurement likelihood in line 21. In this algorithm, care has to be taken 
when computing the difference of two angles, since the result may be off by 
2n. 


Mathematical Derivation of EKF Localization 


Prediction Step (Lines 3-7) The EKF localization algorithm uses the mo- 
tion model defined in Equation (5.13). Let us briefly restate the definition: 





" m -2 sinô + & sin(0 + &At) 
Pa ave 

y! = y |+ = cos 0 = cos(0 + Gà At) 

0 0 a, At 


Here x:-1 = (x y 0)? and a; = (x' y 0')" are the state vectors at time t — 1 
and t, respectively. The true motion is described by a translational velocity, 
oz, and a rotational velocity, &;. As already stated in Equation (5.10), these 
velocities are generated by the motion control, u, = (v, w+)”, with additive 


Gaussian noise: 


( K ) = ( T )+ c ARS ( s ) eto M) 

Ut Ut Easu2 十 adw2 Ct 

We already know from Chapter 3 that EKF localization maintains a local pos- 
terior estimate of the state, represented by the mean j1.; and covariance 
34.1. We also recall that the “trick” of the EKF lies in linearizing the motion 


and measurement model. For that, we decompose the motion model into a 
noise-free part and a random noise component: 











a z£ =Z sind + 2 sin( + w,At) 
y! - y |+| co58—2* cos(0--u, At) | +N(0, Re) 
6 0 ws At 
b 
Tt glut, t1) 


Equation (7.6) approximates Equation (7.4) by replacing the true motion 
(6, &«)T by the executed control (v; w+), and capturing the motion noise 
in an additive Gaussian with zero mean. Thus the left term in Equation (7.6) 
treats the control as if it were the true motion of the robot. We recall from 
Chapter 3.3 that EKF linearization approximates the function g through a 
Taylor expansion: 


glut, 2151) © glut, pii) + Gi rici — pia) 
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The function g(u:, 4—1) is simply obtained by replacing the exact state 
xt-1—-which we do not know 一 by our expectation /1 一 which we know. 
The Jacobian G; is the derivative of the function g with respect to x;.. eval- 
uated at u; and p1: 








Oa Ox’ Ox’ 

Ont, Opt, OLt—1,0 

G, = glut, Ht) —— ay! ay’ ay’ 
t T Ox 1 Oli, Olt—-1,y Oi —1,0 
z 90' D0/ 06" 





Oii Oii OL 1,0 





Here jj 3 = (pi ig Mt-1,y Ht 1.0)? denotes the mean estimate factored 
into its individual three values, and mos is short for the derivative of g 
along the x’ dimension, taken with respect to x at jj 1. Calculating these 


derivatives from Equation (7.6) gives us the following matrix: 





1 0 i^ COS 441,0 + CoS(pt—1,0 + wt I) 
Gi = 0 1 aot (— sin fe—1,9 + sin(He-1,6 + oA) 


0 0 1 





To derive the covariance of the additional motion noise, A (0, R+), we first 
determine the covariance matrix M; of the noise in control space. This follows 
directly from the motion model in Equation (7.5): 


M, = au? + aw? 0 
1 = 
0 a3v7 + ow? 


The motion model in (7.6) requires this motion noise to be mapped into state 
space. The transformation from control space to state space is performed by 
another linear approximation. The Jacobian needed for this approximation, 
denoted V;, is the derivative of the motion function g with respect to the 
motion parameters, evaluated at u; and Ht -1: 
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2 T 
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0 At 


The multiplication V, M; V;" then provides an approximate mapping be- 
tween the motion noise in control space to the motion noise in state space. 
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With this derivation, lines 6 and 7 of the EKF localization algorithm corre- 
spond exactly to the prediction updates of the general EKF algorithm, de- 
scribed in Table 3.3. 


Correction Step (Lines 8-20) To perform the correction step, EKF localiza- 
tion also requires a linearized measurement model with additive Gaussian 
noise. The measurement model for our feature-based maps shall be a variant 
of Equation (6.40) in Chapter 6.6, which presupposes knowledge of the land- 
mark identity via the correspondence variable c;. Let j = cj be the identity 
of the landmark that corresponds to the i-th component in the measurement 
vector. Then we have 











"i V (mj, — x)? (ms, — y? 
内 一 atan2(mj,y — Y, Mj z x) 一 0 十 NM (0， Q:), 
sí TI s 
——— S 
zi h(z4,j,m) 


where (mj, m;jj)^ are the coordinates of the i-th landmark detection at 
time t, and mj s is its (correct) signature. The Taylor approximation of this 
measurement model is 


h(z,, j, m) RI h(is, j, m) 十 H? (zi — fix). 


Hj is the Jacobian of h with respect to the robot location, computed at the 
predicted mean ji: 
or} or} ori 
—— Ola Olt, y olt, o 
gi 2 heim) (€ api ə ağ 
EOT 5 Ip 8j 
Ox, Htm Hity Lt, 
Os; Os; Os; 
Olt,» Ofit,y Fto 

















Mj, — tu mj,y—hty 0 
va » 
一 Mi y` Ht,y Mj, s — Htm 1 
q q 
0 0 0 


with q short for (Mj — Hte)? + (mj, — Aty)”. Notice that the last row of 
Hj is all zero. This is because the signature does not depend on the robot 
pose. The effect of this degeneracy is that the observed signature s! has no 
effect on the result of the EKF update. This should come at no surprise: 
knowledge of the correct correspondence cj renders the observed signature 
entirely uninformative. 
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The covariance Q; of the additional measurement noise in Equation (7.12) 
follows directly from (6.40): 


o 0 0 
Qi = 0 05 0 
0 0 c 


Finally, we note that our feature-based localizer processes multiple measure- 
ments at a time, whereas the EKF discussed in Chapter 3.2 only processed 
a single sensor item. Our algorithm relies on an implicit conditional inde- 
pendence assumption, which we briefly discussed in Chapter 6.6, Equation 
(6.39). Essentially, we assume that all feature measurement probabilities are 
independent given the pose z+, the landmark identities c;, and the map m: 


p(z | Tecn m) = LL» | zc, ct, m) 
i 


This is usually a good assumption, especially if the world is static. It enables 
us to incrementally add the information from multiple features into our filter, 
as specified in lines 9 through 18 in Table 7.2. Care has to be taken that the 
pose estimate is updated in each iteration of the loop, since otherwise the 
algorithm computes incorrect observation predictions (intuitively, this loop 
corresponds to multiple observation updates with zero motion in between). 
With this in mind it is straightforward to see that lines 8-20 are indeed an 
implementation of the general EKF correction step. 


Measurement Likelihood (Line 21) Line 21 computes the likelihood p(z; | 
Cit, M, 214-1, U1:4) Of a measurement z;. This likelihood is not essential for 
the EKF update but is useful for the purpose of outlier rejection or in the 
case of unknown correspondences. Assuming independence between the 
individual feature vectors, we can restrict the derivation to individual feature 
vectors zt and compute the overall likelihood analogous to (7.16). For known 
data associations cl:ty the likelihood can be computed from the predicted 
belief bel(z,) = N (zi; fit, 34) by integrating over the pose z;, and omitting 
irrelevant conditioning variables: 


plz | €1:4, M, Z1:t—1; U1st) 


= [ee | Lt, Cit, m, £141,914) P(Lt | eia, M, z14 51,914) Axe 


— foe | Ti, c, m) plz: | C1:t—1; M, Z1:t—1; 01:4) da, 
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2 / plot | s, cd m) Belay) dis 


The left term in the final integral is the measurement likelihood assuming 
knowledge of the robot location x+. This likelihood is given by a Gaussian 
with mean at the measurement that is expected at location z+. This measure- 
ment, denoted 27, is provided by the measurement function h. The covari- 
ance of the Gaussian is given by the measurement noise Q+. 


p(zi | Ti, ci, m) m N (25; h(x, cj, m), Qi) 
~x (x hiec, m) + Hi (zi i), Qt) 


(7.18) follows by applying our Taylor expansion (7.13) to h. Plugging this 
equation back into (7.17), and replacing bel(x,) by its Gaussian form, we get 
the following measurement likelihood: 
plz | cie, M, 214-1, U1:t) 

& N (24; h(fie, ci, m) + Hi (zx, — fie), Q) @ N (zi; Bo Ee) 
where & denotes the familiar convolution over the variable x+. This equa- 
tion reveals that the likelihood function is a convolution of two Gaussians; 
one representing the measurement noise, the other representing the state 
uncertainty. We already encountered integrals of this form in Chapter 3.2, 
where we derived the motion update of the Kalman filter and the EKF. The 
closed-form solution to this integral is derived completely analogously to 
those derivations. In particular, the Gaussian defined by (7.19) has mean 
h(i, i, m) and covariance H; X; HF + Q;. Thus, we have under our linear 
approximation the following expression for the measurement likelihood: 


plz | C1:t, M, Z1:t—1; Ur:t) end N (zi; (fiz, ci, m), Hi 2 Hy + Qi) 
That is, 


plz | C1:t) M, Z1:4—1, Unt) 


1. 7 
= exp] 54h - hu eom)? [He Es HE +Q] GE -hao cham) } 


By replacing the mean and covariance of this expression by 2} and S+, respec- 
tively, we get line 21 of the EKF algorithm in Table 7.2. 

The EKF localization algorithm can now easily be modified to accommo- 
date outliers. The standard approach is to only accept landmarks for which 
the likelihood passes a threshold test. This is generally a good idea: Gaus- 
sians fall off exponentially, and a single outlier can have a huge effect on the 
pose estimate. In practice, thresholding adds an important layer of robust- 
ness to the algorithm without which EKF localization tends to be brittle. 
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Figure 7.7 AIBO robots on the RoboCup soccer field. Six landmarks are placed at 
the corners and the midlines of the field. 


Physical Implementation 


We now illustrate the EKF algorithm using simulations of a four-legged 
AIBO robot localizing on a RoboCup soccer field. Here, the robot local- 
izes using six uniquely colored markers placed around the field (see Fig- 
ure 7.7). Just like in the EKF algorithm given in Table 7.2, motion control 
uz = (vi wz)? is modeled by translational and rotational velocity, and obser- 
vations z, = (rt ¢ s+)’ measure relative distance and bearing to a marker. 
For simplicity we assume that the robot detects only one landmark at a time. 


Prediction Step (Lines 3-7) Figure 7.8 illustrates the prediction step of the 
EKF localization algorithm. Shown there are the prediction uncertainties re- 
sulting from different motion noise parameters, a; — a4, used in line 5 of 
the algorithm. The parameters o» and as are set to 5% in all visualizations. 
The main translational and rotational noise parameters o; and o4 vary be- 
tween (10%, 10%), (3096, 10%), (1096, 30%), and (30%, 3096) (from upper left 
to lower right in Figure 7.8). In each of the plots the robot executes the con- 
trol u; = (10cm/sec, 5? /sec) for 9 seconds, resulting in a circular arc of length 
90cm and rotation 45?. The robot's previous location estimate is represented 
by the ellipse centered at the mean Ht -1 = (80, 100, 0). 

The EKF algorithm computes the predicted mean ji; by shifting the previ- 
ous estimate under the assumption of noise free motion (line 6). The corre- 
sponding uncertainty ellipse, 5,, consists of two components; one estimating 
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Figure 7.8 Prediction step of the EKF algorithm. The panels were generated with 
different motion noise parameters. The robot's initial estimate is represented by the 
ellipse centered at Mi-1. After moving on a circular arc of 90cm length while turn- 
ing 45 degrees to the left, the predicted position is centered at j. In panel (a), the 
motion noise is relatively small in both translation and rotation. The other panels 
represent (b) high translational noise, (c) high rotational noise, and (d) high noise in 
both translation and rotation. 


uncertainty due to the initial location uncertainty, the other estimating uncer- 
tainty due to motion noise (line 7). The first component, G; 2 _1G7 ， ignores 
motion noise and projects the previous uncertainty X;..; through a linear ap- 
proximation of the motion function. Recall from Equations (7.8) and (7.9) 
that this linear approximation is represented by the matrix G+, which is the 
Jacobian of the motion function w.r.t. the previous robot location. 

The resulting noise ellipses are identical in the four panels since they do 
not consider motion noise. Uncertainty due to motion noise is modeled by 
the second component of X, which is given by V; Mj p-- (line 7). The matrix 
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Figure 7.9 Measurement prediction. The left plots show two predicted robot loca- 
tions along with their uncertainty ellipses. The true robot and the observation are 
indicated by the white circle and the bold line, respectively. The panels on the right 
show the resulting measurement predictions. The white arrows indicate the innova- 
tions, the differences between observed and predicted measurements. 


M; represents the motion noise in control space (line 5). This motion noise 
matrix is mapped into state space by multiplication with V;, which is the 
Jacobian of the motion function w.r.t. motion control (line 4). As can be seen, 
the resulting ellipse represents large translational velocity error (al = 30%) 
by large uncertainty along the motion direction (right plots in Figure 7.8). 
Large rotational error (a4 = 30%) results in large uncertainty orthogonal to 
the motion direction (lower plots in Figure 7.8). The overall uncertainty of 
the prediction, ¥;, is then given by adding the two uncertainty components. 


Correction Step: Measurement Prediction (Lines 8-14) In the first part of 
the correction step, the EKF algorithm predicts the measurement, z;, using 
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Figure 7.10 Correction step of the EKF algorithm. The panels on the left show the 
measurement prediction, and the panels on the right the resulting corrections, which 
update the mean estimate and reduce the position uncertainty ellipses. 


the predicted robot location and its uncertainty. Figure 7.9 illustrates the 
measurement prediction. The left plots show the predicted robot locations 
along with their uncertainty ellipses. The true robot location is indicated by 
the white circle. Now assume that the robot observes the landmark ahead to 
its right, as indicated by the bold line. The panels on the right show the cor- 
responding predicted and actual measurements in measurement space. The 
predicted measurement, z;, is computed from the relative distance and bear- 
ing between the predicted mean, ji;, and the observed landmark (line 12). 
The uncertainty in this prediction is represented by the ellipse S;. Similar 
to state prediction, this uncertainty results from a convolution of two Gaus- 
sians. The ellipse Q; represents uncertainty due to measurement noise (line 
8), and the ellipse H; 5 HT represents uncertainty due to uncertainty in the 
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robot location. The robot location uncertainty X, is mapped into observation 
uncertainty by multiplication with H;, the Jacobian of the measurement func- 
tion w.r.t. the robot location (line 13). S+, the overall measurement prediction 
uncertainty, is then the sum of these two ellipses (line 14). The white ar- 
rows in the panels on the right illustrate the so-called innovation vector z; — zi, 
which is simply the difference between the observed and the predicted mea- 
surement. This vector plays a crucial role in the subsequent update step. It 
also provides the likelihood of the measurement z;, which is given by the 
likelihood of the innovation vector under a zero mean Gaussian with covari- 
ance S; (line 21). That is, the "shorter" (in the sense of Mahalanobis distance) 
the innovation vector, the more likely the measurement. 


Correction Step: Estimation Update (Lines 15-21) The correction step of 
the EKF localization algorithm updates the location estimate based on the 
innovation vector and the measurement prediction uncertainty. Figure 7.10 
illustrates this step. For convenience, the panels on the left show the mea- 
surement prediction again. The panels on the right illustrate the resulting 
corrections in the position estimates, as shown by the white arrows. These 
correction vectors are computed by a scaled mapping of the measurement in- 
novation vectors (white arrows in left panels) into state space (line 16). This 
mapping and scaling is performed by the Kalman gain matrix, K;, computed 
in line 15. Intuitively, the measurement innovation gives the offset between 
predicted and observed measurement. This offset is then mapped into state 
space and used to move the location estimate in the direction that would re- 
duce the measurement innovation. The Kalman gain additionally scales the 
innovation vector, thereby considering the uncertainty in the measurement 
prediction. The more certain the observation, the higher the Kalman gain, 
and hence the stronger the resulting location correction. The uncertainty el- 
lipse of the location estimate is updated by similar reasoning (line 17). 


Example Sequence Figure 7.11 shows two sequences of EKF updates, us- 
ing different observation uncertainties. The left panels show the robot's tra- 
jectories according to the motion control (dashed lines) and the resulting true 
trajectories (solid lines). Landmark detections are indicated by thin lines, 
with the measurements in the upper panel being less noisy. The dashed lines 
in the right panels plot the paths as estimated by the EKF localization algo- 
rithm. As expected, the smaller measurement uncertainty in the upper row 
results in smaller uncertainty ellipses and in smaller estimation errors. 
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Figure 7.11 EKF-based localization with an accurate (upper row) and a less accurate 
(lower row) landmark detection sensor. The dashed lines in the left panel indicate the 
robot trajectories as estimated from the motion controls. The solid lines represent the 
true robot motion resulting from these controls. Landmark detections at five locations 
are indicated by the thin lines. The dashed lines in the right panels show the corrected 
robot trajectories, along with uncertainty before (light gray, ©) and after (dark gray, 
X+) incorporating a landmark detection. 


Estimating Correspondences 


EKF Localization with Unknown Correspondences 


The EKF localization discussed thus far is only applicable when landmark 
correspondences can be determined with absolute certainty. In practice, this 
is rarely the case. Most implementations therefore determine the identity of 
the landmark during localization. Throughout this book, we will encounter 
a number of strategies to cope with the correspondence problem. The most 
simple of allis known as maximum likelihood correspondence, in which one first 
determines the most likely value of the correspondence variable, and then 
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takes this value for granted. 

Maximum likelihood techniques are brittle if there are many equally likely 
hypotheses for the correspondence variable. However, one can often design 
the system for this not to be the case. To reduce the danger of asserting 
a false data association, there exist essentially two techniques: First, select 
landmarks that are sufficiently unique and sufficiently far apart from each 
other that confusing them with each other is unlikely. Second, make sure that 
the robot’s pose uncertainty remains small. Unfortunately, these two strate- 
gies are somewhat counter to each other, and finding the right granularity of 
landmarks in the environment can be somewhat of an art. 

Nevertheless, the maximum likelihood technique is of great practical im- 
portance. Table 7.3 depicts the EKF localization algorithm with a maximum 
likelihood estimator for the correspondence. The motion update in lines 2 
through 7 is identical to the one in Table 7.2. The key difference is in the mea- 
surement update: For each observation, we first calculate for all landmarks k 
in the map a number of quantities that enable us to determine the most likely 
correspondence (lines 10 through 15). The correspondence variable j(i) is 
then chosen in line 16, by maximizing the likelihood of the measurement 
zi given any possible landmark mk in the map. Note that this likelihood 
function is identical to the likelihood function used by the EKF algorithm for 
known correspondences. The EKF update in lines 18 and 19 only incorpo- 
rates the most likely correspondences. 

We note that the algorithm in Table 7.3 may not quite be as efficient as it 
could be. It can be improved through a more thoughtful selection of land- 
marks in line 10. In most settings, the robot only sees a small number of 
landmarks at a time in its immediate vicinity; and simple tests can reject a 
large number of unlikely landmarks in the map. 


Mathematical Derivation of the ML Data Association 


The maximum likelihood estimator determines the correspondence that 
maximizes the data likelihood. 
& = argmaxp(z | eio M, zia-i uis) 
Ct 

Here c; is the correspondence vector at time t. As before, the vector z; = 
1, 22,...) is the measurement vector that contains the list of features, or 
landmarks, zi, observed at time t. 

The argmax operator in (7.22) selects the correspondence vector 6; that 
maximizes the likelihood of the measurement. Note that this expression is 
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Table 7.3 The extended Kalman filter (EKF) localization algorithm with unknown 
correspondences. The correspondences j(i) are estimated via a maximum likelihood 


estimator. 
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conditioned on prior correspondences c;;.;. While those have been esti- 
mated in previous update steps, the maximum likelihood approach treats 
them as if they are always correct. This has two important ramifications: It 
makes it possible to update the filter incrementally. But it also introduces 
brittleness in the filter, which tends to diverge when correspondence esti- 
mates are erroneous. 

Even under the assumption of known prior correspondences, there are ex- 
ponentially many terms in the maximization (7.22). When the number of 
detected landmarks per measurement is large, the number of possible cor- 
respondences may grow too large for practical implementations. The most 
common technique to avoid such an exponential complexity performs the 
maximization separately for each individual feature zf in the measurement 
vector z;. We already derived the likelihood function for individual features 
in the derivation of the EKF localization algorithm for known correspon- 
dences. Following Equations (7.17) through (7.20), the correspondence of 
each feature follows as: 

&- argmax p(z; | ci, M, z14—1, u14) 


i 
t 


argmax N (z$; h(lis, ci, m), H,94 HI + Qi) 


C 


Q 


c 


This calculation is implemented in line 16 in Table 7.3. This component-wise 
optimization is “justified” only when we happen to know that individual 
feature vectors are conditionally independent—an assumption that is usu- 
ally adopted for convenience. Under this assumption, the term that is being 
maximized in (7.22) becomes a product of terms with disjoint optimization 
parameters, for which the maximum is attained when each individual factor 
is maximal, as determined in (7.23). Using this maximum likelihood data 
association, the correctness of the algorithm follows now directly from the 
correctness of the EKF localization algorithm with known correspondences. 


Multi-Hypothesis Tracking 


There exist a number of extensions of the basic EKF to accommodate situa- 
tions where the correct data association cannot be determined with sufficient 
reliability. Several of those techniques will be discussed later in this book, 
hence our exposition at this point will be brief. 

A classical technique that overcomes difficulties in data association is the 
multi-hypothesis tracking filter, or MHT. The MHT can represent a belief by 
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multiple Gaussians. It represents the posterior by the mixture 
1 
MEUS 


Here l is the index of the mixture component. Each such component, or 
“track” in MHT jargon, is itself a Gaussian with mean u and covariance 
X. The scalar v4; > 0 is a mixture weight. It determines the weight of the 
l-th mixture component in the posterior. Since the posterior is normalized by 
> Uu each y, is a relative weight, and the contribution of the /-th mixture 
component depends on the magnitude of all other mixture weights. 

As we shall see below when we describe the MHT algorithm, each mixture 
component relies on a unique sequence of data association decisions. Hence, 
it makes sense to write c; ; for the data association vector associated with the 
l-th track, and c;.;, for all past and present data associations associated with 
the l-th mixture component. With this notation, we can now think of mixture 
components as contributing local belief functions conditioned on a unique 
sequence of data associations: 





T 
bel(zi) = 5 Ypi ı det (27X71) 2 exp [-i( — Mea)” Yl s — in) 
l 


bel(xi) = p(z | zy Ure, cut) 


Here c14,4, = (0145, C2,1,---, C1} denotes the sequence of correspondence vec- 
tors associated with the l-th track. 

Before describing the MHT, it makes sense to discuss a completely in- 
tractable algorithm from which the MHT is derived. This algorithm is the 
full Bayesian implementation of the EKF under unknown data association. 
It is amazingly simple: Instead of selecting the most likely data association 
vector, our fictitious algorithm maintains them all. More specifically, at time 
t each mixture is split into many new mixtures, each conditioned on a unique 
correspondence vector c+. Let m be the index of one of the new Gaussians, 
and I be the index from which this new Gaussian is derived, for the corre- 
spondence c;,;. The weight of this new mixture is then set to 


Wem = VI plz | C1 1b Ctm Ž1:t—1; U1:t) 


This is the product of the mixture weight v;; from which the new com- 
ponent was derived, times the likelihood of the measurement z; under the 
specific correspondence vector that led to the new mixture component. In 
other words, we treat correspondences as latent variable and calculate the 
posterior likelihood that a mixture component is correct. A nice aspect of 
this approach is that we already know how to compute the measurement 
likelihood p(z | &14-1,5 Ctm; Z14—1, U1:t) in Equation (7.26): It is simply the 
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likelihood of the measurement computed in line 21 of the EKF localization al- 
gorithm for known data associations (Table7.2). Thus, we can incrementally 
calculate the mixture weights for each new component. The only downside 
of this algorithm is the fact that the number of mixture components, or tracks, 
grows exponentially over time. 

The MHT algorithm approximates this algorithm by keeping the number 
of mixture components small. This process is called pruning. Pruning termi- 
nates every component whose relative mixture weight 


Vel 
PM Vim 


is smaller than a threshold Win. It is easy to see that the number of mixture 
components is always at most v1. Thus, the MHT maintains a compact 
posterior that can be updated efficiently. It is approximate in that it maintains 
a very small number of Gaussians, but in practice the number of plausible 
robot locations is usually very small. 

We omit a formal description of the MHT algorithm at this point, and in- 
stead refer the reader to a large number of related algorithms in this book. 
We note than when implementing an MHT, it is useful to devise strategies 
for identifying low-likelihood tracks before instantiating them. 


UKF Localization 


UKF localization is a feature-based robot localization algorithm using the 
unscented Kalman filter. As described in Chapter 3.4, the UKF uses the un- 
scented transform to linearize the motion and measurement models. Instead 
of computing derivatives of these models, the unscented transform repre- 
sents Gaussians by sigma points and passes these through the models. Ta- 
ble 7.4 summarizes the UKF algorithm for landmark based robot localization. 
It assumes that only one landmark detection is contained in the observation 
z and that the identity of the landmark is known. 


Mathematical Derivation of UKF Localization 


The main difference between the localization version and the general UKF 
given in Table 3.4 is in the handling of prediction and measurement noise. 
Recall that the UKF in Table 3.4 is based on the assumption that prediction 
and measurement noise are additive. This made it possible to consider the 
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1: Algorithm UKF _localization(u:—1, X4 1, Ut, Zt, m): 
Generate augmented mean and covariance 


2: Mes aiv? + aw? 0 
i : 0 azv? + asw? 


c2 0 
TEE 


&  puLaí-(uL, (007 (0 0)7)7 


X41 0 0 
5: $1 一 0 Mi 0 
0 0 Q 


Generate sigma points 
6: XP = (Mia Hi1 ty Se s-i-o/MXLau) 

Pass sigma points through motion model and compute Gaussian statistics 
7: XP = glue + XF, XE) 





9: Be = it wh (AE, mua) 
Predict observations at sigma points and compute Gaussian statistics 
10: 2, = h(XP) + Af 
lis SS um 
1j S = 2 wl (Eu — 2t) (Zie - A) 
13: DP? YS wh? GG — fit) (Zin — 2e)" 
Update mean and covariance 
14: IGS er 5 
15: pu = fie Ki(zi— 4) 
16 De = Ye — Ki St KE 


i 
17: pz, —det(2-91) 2 exp i-i (a — 2AT Sz (a a &)) 
18: return Ht, Xt, Pz 











Table 7.4 The unscented Kalman filter (UKF) localization algorithm, formulated 
here for a feature-based map and a robot equipped with sensors for measuring range 
and bearing. This version handles single feature observations only and assumes 
knowledge of the exact correspondence. L is the dimensionality of the augmented 
state vector, given by the sum of state, control, and measurement dimensions. 


Srjs.cn 000000 


222 


(7.28) 


(7.29) 


(7.30) 
(7.31) 
(7.32) 


7 Mobile Robot Localization: Markov and Gaussian 


noise terms by simply adding their covariances R, and Q: to the predicted 
state and measurement uncertainty, respectively (lines 5 and 9 in Table 3.4). 

UKF localization provides an alternative, more accurate approach to con- 
sidering the impact of noise on the estimation process. The key "trick" is 
to augment the state with additional components representing control and 
measurement noise. The dimensionality L of the augmented state is given 
by the sum of the state, control, and measurement dimensions, which is 
34-242-— 7 in this case (the signature of feature measurements is ignored 
for simplicity). Since we assume zero-mean Gaussian noise, the mean /2 , 
of the augmented state estimate is given by the mean of the location esti- 
mate, 14; .1, and zero vectors for the control and measurement noise (line 4). 
The covariance X? , of the augmented state estimate is given by combin- 
ing the location covariance, X. ,, the control noise covariance, M;, and the 
measurement noise covariance, Q;, as done in line 5. 

The sigma point representation of the augmented state estimate is gen- 
erated in line 6, using Equation (3.66) of the unscented transform. In this 
example, X? , contains 2L + 1 = 15 sigma points, each having components 
in state, control, and measurement space: 


r T 
ma 
a Fe u 
Xi 1 m Xi T 
z 
Xi 


We choose mixed time indices to make clear that A7 , refers to x,_; and the 
control and measurement components refer to u, and z+, respectively. 

The location components A7 , of these sigma points are then passed 
through the velocity motion model g, defined in Equation (5.9). Line 7 per- 
forms this prediction step by applying the velocity motion model defined in 
Equation (5.13), using the control u, with the added control noise component 
Xia of each sigma point: 


Vi 


= ut sin 954-1 + E sin(0; 4-1 十 wi At) 








Wi,t 
Af = Ae sut cos 0; 4 1 — Us cos(0; 41 + wi At) 
WitAt 
where 
Vit = Vet a 
Wit = Wet ae 
it-1 = acm 
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are generated from the control w = (vi w+)! and the individual components 
of the sigma points. For example, ap represents the translational velocity 
v; of the i-th sigma point. The predicted sigma points, X7, are thus a set 
of robot locations, each resulting from a different combination of previous 
location and control. 

Lines 8 and 9 compute the mean and covariance of the predicted robot 
location, using the unscented transform technique. Line 9 does not require 
the addition of a motion noise term, which was necessary in the algorithm 
described in Table 3.4. This is due to the state augmentation, which results 
in predicted sigma points that already incorporate the motion noise. This 
fact additionally makes the redrawing of sigma points from the predicted 
Gaussian obsolete (see line 6 in Table 3.4). 

In line 10, the predicted sigma points are then used to generate measure- 
ment sigma points based on the measurement model defined in Equation 
(6.40) in Chapter 6.6: 


E I" pz] 2 M vy] 2 z[r] 
Zit = | yma Xit t (my — Xi) ) + | M 
t 


atan2(my 一 a mag 一 a = al) 





Observation noise is assumed to be additive in this case. 

The remaining updated steps are identical to the general UKF algorithm 
stated in Table 3.4. Lines 11 and 12 compute the mean and covariance of 
the predicted measurement. The cross-covariance between robot location 
and observation is determined in line 13. Lines 14 through 16 update the 
location estimate. The likelihood of the measurement is computed from the 
innovation and the predicted measurement uncertainty, just like in the EKF 
localization algorithm given in Table 7.2. 


Illustration 


We now illustrate the UKF localization algorithm using the same examples 
as were used for the EKF localization algorithm. The reader is encouraged to 
compare the following figures to the ones shown in Chapter 7.4.4. 


Prediction Step (Lines 2-9) Figure 7.12 illustrates the UKF prediction step 
for different motion noise parameters. The location components A7 , of the 
sigma points generated from the previous belief are indicated by the cross 
marks located symmetrically around 441. The 15 sigma points have seven 
different robot locations, only five of which are visible in this z-y-projection. 
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Figure 7.12 Prediction step of the UKF algorithm. The graphs were generated with 
different motion noise parameters. The robot's initial estimate is represented by the 
ellipse centered at /Mi-1， The robot moves on a circular arc of 90cm length while 
turning 45 degrees to the left. In panel (a), the motion noise is relatively small in both 
translation and rotation. The other panels represent (b) high translational noise, (c) 
high rotational noise, and (d) high noise in both translation and rotation. 


The additional two points are located "on top" and "below" the mean sigma 
point, representing different robot orientations. The arcs indicate the motion 
prediction performed in line 7. As can be seen, 11 different predictions are 
generated, resulting from different combinations of previous location and 
motion noise. The panels illustrate the impact of the motion noise on these 
updates. The mean ji, and uncertainty ellipse X, of the predicted robot loca- 
tion is generated from the predicted sigma points. 


Measurement Prediction (Lines 10-12) In the measurement prediction 
step, the predicted robot locations A7 are used to generate the measurement 
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Figure7.13 Measurement prediction. The left plots show the sigma points predicted 
from two motion updates along with the resulting uncertainty ellipses. The true robot 
and the observation are indicated by the white circle and the bold line, respectively. 
The panels on the right show the resulting measurement prediction sigma points. 
The white arrows indicate the innovations, the differences between observed and 
predicted measurements. 


sigma points Z, (line 10). The black cross marks in the left plots of Figure 7.13 
represent the location sigma points, and the white cross marks in the right 
plots indicate the resulting measurement sigma points. Note that the 11 dif- 
ferent location sigma points generate 15 different measurements, which is 
due to different measurement noise components X being added in line 10. 
The panels also show the mean 2; and uncertainty ellipse 5; of the predicted 
measurement, extracted in lines 11 and 12. 


Correction Step: Estimation Update (Lines 14-16) The correction step of 
the UKF localization algorithm is virtually identical to EKF correction step. 
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Figure 7.14 Correction step of the UKF algorithm. The panels on the left show the 
measurement prediction, and the panels on the right the resulting corrections, which 
update the mean estimate and reduce the position uncertainty ellipses. 


The innovation vector and the measurement prediction uncertainty are used 
to update the estimate, as indicated by the white arrow in Figure 7.14. 


Example Figure 7.15 shows a sequence of location estimates generated by 
a particle filter (upper right), the EKF (lower left), and the UKF (lower right). 
The upper left graph shows the robot's trajectory according to the motion 
control (dashed line) and the resulting true trajectory (solid line). Landmark 
detections are indicated by thin lines. The dashed lines in the other three pan- 
els show the paths estimated with the different techniques. The covariances 
of the particle filter estimates are extracted from the sample sets of a particle 
filter before and after the measurement update (see Table 8.2). The particle 
filter estimates are shown here as reference since the particle filter does not 
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Figure 7.15 Comparison of UKF and EKF estimates: (a) Robot trajectory according 
to the motion control (dashed lines) and the resulting true trajectory (solid lines). 
Landmark detections are indicated by thin lines. (b) Reference estimates, generated 
by a particle filter. (c) EKF and (d) UKF estimates. 


make any linearization approximations. As can be seen, the estimates of the 
EKF and the UKF are extremely close to these reference estimates, with the 
UKF being slightly closer. 

The impact of the improved linearization applied by the UKF is more 
prominent in the example shown in Figure 7.16. Here, a robot performs two 
motion controls along the circle indicated by the thin line. The panels show 
the uncertainty ellipses after the two motions (the robot makes no obser- 
vation). Again, the covariances extracted from exact, sample-based motion 
updates are shown as reference. The reference samples were generated us- 
ing algorithm sample motion model velocity in Table 5.3. While the EKF 
linearization incurs significant errors both in the location of the mean and 
in the "shape" of the covariance, the UKF estimates are almost identical to 
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Figure 7.16 Approximation error due to linearization. The robot moves on a cir- 
cle. Estimates based on (a) EKF prediction and (b) UKF prediction. The reference 
covariances are extracted from an accurate, sample-based prediction. 


the reference estimates. This example also shows a subtle difference between 
the EKF and the UKF prediction. The mean predicted by the EKF is always 
exactly on the location predicted from the control (line 6 in Table 7.2). The 
UKF mean, on the other hand, is extracted from the sigma points and can 
therefore deviate from the mean predicted by the control (line 7 in Table 7.4). 
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Practical Considerations 


The EKF localization algorithm and its close relative, MHT localization, are 
popular techniques for position tracking. There exist a large number of vari- 
ations of these algorithms that enhance their efficiency and robustness. 


Efficient search. First, it is often impractical to loop through all landmarks 
k in the map, as is done by our EKF localization algorithm for unknown 
correspondences. Often, there exist simple tests to identify plausible can- 
didate landmarks (e.g., by simply projecting the measurement into x-y- 
space), enabling one to rule out all but a constant number of candidates. 
Such algorithms can be orders of magnitude faster than our naive imple- 
mentation. 


Mutual exclusion. A key limitation of our implementation arises from 
our assumed independence of feature noise in the EKF (and, by inheri- 
tance, the MHT). The reader may recall condition (7.16), which enabled 
us to process individual features sequentially, thereby avoiding a poten- 
tial exponential search through the space of all correspondence vectors. 
Unfortunately, such an approach allows for assigning multiple observed 
features, say z} and zl with i Æ j, to be assigned to the same landmark in 
the map: ĉi = ¢}. For many sensors, such a correspondence assignment 
is wrong by default. For example, if the feature vector is extracted from a 
single camera image, we know by default that two different regions in the 
image space must correspond to different locations in the physical world. 
Put differently, we usually know that i 4 j — 6i # &. This (hard!) con- 
straint is called mutual exclusion principle in data association. It reduces the 
space of all possible correspondence vectors. Advanced implementations 
consider this constraint. For example, one might first search for each cor- 
respondence separately—as in our version of the EKF localizer—followed 
by a "repair" phase in which violations of the mutual exclusion principle 
are resolved by changing correspondence values accordingly. 


Outlier rejection. Further, our implementation does not address the is- 
sue of outliers. The reader may recall from Chapter 6.6 that we allow for 
a correspondence c = N + 1, with N being the number of landmarks in 
the map. Such an outlier test is quite easily added to the EKF localization 
algorithms. In particular, if we set 741 to be the a prior probability of 
an outlier, the argmax-step in line 16 of EKF localization (Table 7.3) can 
default to N + 1 if an outlier is the most likely explanation of the measure- 
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ment vector. Clearly, an outlier does not provide any information on the 
robot's pose; hence, the pose-related terms are simply omitted in lines 18 
and 19 in Table 7.3. 


EKF and UKF localization are only applicable to position tracking problems. 
In general, linearized Gaussian techniques tend to work well only if the po- 
sition uncertainty is small. There are several complimentary reasons for this 
observation: 


e A unimodal Gaussian is usually a good representation of uncertainty in 
tracking whereas it is not in more general global localization problems. 


。 Even during tracking, unimodal Gaussians are not well suited to repre- 
sent hard spatial constraints such as “the robot is close to a wall but can 
not be inside the wall”. The severity of this limitation increases with the 
uncertainty in the robot location. 


* A narrow Gaussian reduces the danger of erroneous correspondence de- 
cisions. This is important particularly for the EKF, since a single false 
correspondence can derail the tracker by inducing an entire stream of lo- 
calization and correspondence errors. 


* Linearization is usually only good in a close proximity to the linearization 
point. Asa rule of thumb, if the standard deviation for the orientation 0 is 
larger than +20 degrees, linearization effects are likely to make both the 
EKF and the UKF algorithms fail. 





The MHT algorithm overcomes most of these problems, at the cost of in- 
creased computational complexity. 


e [tcan solve the global localization problem by initializing the belief with 
multiple Gaussian hypotheses. The hypotheses can be initialized accord- 
ing to the first measurements. 


* The kidnapped robot problem can be addressed by injecting additional 
hypotheses into the mixture. 


e Hard spatial constraints are still hard to model, but can be approximated 
better using multiple Gaussians. 


* The MHT is more robust to the problem of erroneous correspondence, 
though it can fail equally when the correct correspondence is not among 
those maintained in the Gaussian mixture. 
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* The MHT discussed here applies the same linearization as the EKF and 
suffers thus from similar approximation effects. The MHT can also be 
implemented using a UKF for each hypothesis. 


The design of the appropriate features for Gaussian localization algorithms is 
a bit of an art. This is because multiple competing objectives have to be met. 
On the one hand, one wants sufficiently many features in the environment, so 
that the uncertainty in the robot's pose estimate can be kept small. Small un- 
certainty is absolutely vital for reasons already discussed. On the other hand, 
one wants to minimize chances that landmarks are confused with each other, 
or that the landmark detector detects spurious features. Many environments 
do not possess too many point landmarks that can be detected with high re- 
liability, hence many implementations rely on relatively sparsely distributed 
landmarks. Here the MHT has a clear advantage, in that it is more robust 
to data association errors. As a rule of thumb, large numbers of landmarks 
tend to work better than small numbers even for the EKF and the UKF. When 
landmarks are dense, however, it is critical to apply the mutual exclusion 
principle in data association. 

Finally, we note that EKF and UKF localization process only a subset of all 
information in the sensor measurement. By going from raw measurements to 
features, the amount of information that is being processed is already dras- 
tically reduced. Further, EKF and UKF localization are unable to process 
negative information. Negative information pertains to the absence of a fea- 
ture. Clearly, not seeing a feature when one expects to see it carries relevant 
information. For example, not seeing the Eiffel Tower in Paris implies that 
it is unlikely that we are right next to it. The problem with negative infor- 
mation is that it induces non-Gaussian beliefs, which cannot be represented 
by the mean and variance. For this reason, EKF and UKF implementations 
simply ignore the issue of negative information, and instead integrate only 
information from observed features. The standard MHT also avoids nega- 
tive information. However, it is possible to fold negative information into 
the mixture weight, by decaying mixture components that failed to observe 
a landmark. 

With all these limitations, does this mean that Gaussian techniques are 
brittle localization techniques? The answer is no. The EKF, the UKF, and 
especially the MHT are surprisingly robust to violations of the linear sys- 
tem assumptions. In fact, the key to successful localization lies in successful 
data association. Later in this book, we will encounter more sophisticated 
techniques for handling correspondences than the ones discussed thus far. 


»js.on 000000 


232 


7.9 


7 Mobile Robot Localization: Markov and Gaussian 


Many of these techniques are applicable (and will be applied) to Gaussian 
representations, and the resulting algorithms are often among the best ones 
known. 


Summary 


In this chapter, we introduced the mobile robot localization problem and de- 
vised a first practical algorithm for solving it. 


The localization problem is the problem of estimating a robot's pose rela- 
tive to a known map of its environment. 


Position tracking addresses the problem of accommodating the local un- 
certainty of a robot whose initial pose is known; global localization is the 
more general problem of localizing a robot from scratch. Kidnapping is a 
localization problem in which a well-localized robot is secretly teleported 
somewhere else without being told—it is the hardest of the three localiza- 
tion problems. 


The hardness of the localization problem is also a function of the degree to 
which the environment changes over time. All algorithms discussed thus 
far assume a static environment. 


Passive localization approaches are filters: they process data acquired by 
the robot but do not control the robot. Active techniques control the robot 
during localization, with the purpose of minimizing the robot’s uncer- 
tainty. So far, we have only studied passive algorithms. Active algorithms 
will be discussed in Chapter 17. 


Markov localization is just a different name for the Bayes filter applied to 
the mobile robot localization problem. 


EKF localization applies the extended Kalman filter to the localization 
problem. EKF localization is primarily applied to feature-based maps. 


The most common technique for dealing with correspondence problems 
is the maximum likelihood technique. This approach simply assumes that 
at each point in time, the most likely correspondence is correct. 


The multi hypothesis tracking algorithm (MHT) pursues multiple corre- 
spondences, using a Gaussian mixture to represent the posterior. Mixture 
components are created dynamically, and terminated if their total likeli- 
hood sinks below a user-specified threshold. 
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* The MHT is more robust to data association problems than the EKF, at an 
increased computational cost. The MHT can also be implemented using 
UKF's for the individual hypotheses. 


e UKF localization uses the unscented transform to linearize the motion and 
measurement models in the localization problem. 


* All Gaussian filters are well-suited for local position tracking problems 
with limited uncertainty and in environments with distinct features. The 
EKF and UKF are less applicable to global localization or in environments 
where most objects look alike. 


* Selecting features for Gaussian filters requires skill. Features must be suf- 
ficiently unambiguous to minimize the likelihood of confusing them, and 
there must be enough of them that the robot frequently encounters fea- 
tures. 


* The performance of Gaussian localization algorithms can be improved by 
a number of measures, such as enforcing mutual exclusion in data associ- 
ation. 


In the next chapter, we will discuss alternative localization techniques that 
aim at dealing with the limitations of the EKF by using different representa- 
tions of the robot's belief. 


Bibliographical Remarks 


Localization has been dubbed "the most fundamental problem to providing a mobile robot with 
autonomous capabilities" (Cox 1991). The use of EKF for state estimation in outdoor robotics 
was pioneered by Dickmanns and Graefe (1988), who used EKFs to estimate highway curvature 
from camera images. Much of the early work on indoor mobile robot localization is surveyed 
in Borenstein et al. (1996) (see also (Feng et al. 1994)). Cox and Wilfong (1990) provides an early 
text on the state of the art in mobile robotics, which also covers localization. Many of the early 
techniques required environmental modifications, such as through artificial beacons. For exam- 
ple, Leonard and Durrant-Whyte (1991) used EKFs when matching geometric beacons extracted 
from sonar scans with beacons predicted from a geometric map of the environment. The prac- 
tice of using artificial markers continues to the present day (Salichs et al. 1999), since often the 
modification of the environment is both feasible and economical. Other early researchers used 
lasers to scan unmodified environments (Hinkel and Knieriemen 1988). 

Moving away from point features, a number of researchers developed more geometric tech- 
niques for localization. For example, Cox (1991) developed an algorithm for matching distances 
measured by infrared sensors, and line segment descriptions of the environment. An approach 
by Weiss et al. (1994) correlated range measurements for localization. The idea of map matching— 
specifically the comparison of a local occupancy grid map with a global environment map—is 
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due to Moravec (1988). A gradient descent localizer based on this idea was described in Thrun 
(1993), and used at the first AAAI competition in 1992 (Simmons et al. 1992). Schiele and Crow- 
ley (1994) systematically compared different strategies to track the robot’s position based on 
occupancy grid maps and ultrasonic sensors. They showed that matching local occupancy grid 
maps with a global grid map results in a similar localization performance as if the matching is 
based on features that are extracted from both maps. Shaffer et al. (1992) compare the robustness 
of map-matching and feature-based techniques, showing that combinations of both yielded the 
best empirical results. Yamauchi and Langley (1997) showed the robustness of map matching 
to environmental change. The idea of using scan matching for localization in robotics goes back 
to Lu and Milios (1994); Gutmann and Schlegel (1996); Lu and Milios (1998), although the basic 
principle had been popular in other fields (Besl and McKay 1992). A similar technique was pro- 
posed by Arras and Vestli (1998), who showed that scan matching made it possible to localize a 
robot with remarkable accuracy. Ortin et al. (2004) found that using camera data along a laser 
stripe increases the robustness of range scan matching. 

A different strain of research investigated geometric techniques for localization (Betke and 
Gurvits 1994). The term “kidnapped robot problem” goes back to Engelson and McDermott 
(1992). The name “Markov localization” is due to Simmons and Koenig (1995), whose localizer 
used a grid to represent posterior probabilities. However, the intellectual roots of this work 
goes back to Nourbakhsh et al. (1995), who developed the idea of “certainty factors” for mo- 
bile robot localizations. While the update rules for certainty factors did not exactly follow the 
laws of probability, they captured the essential idea of multi-hypothesis estimation. A seminal 
paper by Cox and Leonard (1994) also developed this idea, by dynamically maintaining trees 
of hypotheses for a localizing robot. Fuzzy logic has been proposed for mobile localization by 
Saffiotti (1997); see also Driankov and Saffiotti (2001). 


Exercises 


1. Suppose a robot is equipped with a sensor for measuring range and bear- 
ing to a landmark; and for simplicity suppose that the robot can also sense 
the landmark identity (the identity sensor is noise-free). We want to per- 
form global localization with EKFs. When seeing a single landmark, the 
posterior is usually poorly approximated by a Gaussian. However, when 
sensing two or more landmarks at the same time, the posterior is often 
well-approximated with a Gaussian. 


(a) Explain why. 


(b) Given k simultaneous measurements of ranges and bearings of k iden- 
tifiably landmarks, devise a procedure for calculating a Gaussian pose 
estimate for the robot under uniform initial prior. You should start 
with the range/bearing measurement model provided in Chapter 6.6. 


2. In this question we seek to design hard environments for global local- 
ization. Suppose we can compose a planar environment out of n non- 
intersecting straight line segments. The free space in the environment 
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has to be confined; however, there might be island of occupied terrain in- 
side the map. For the sake of this exercise, we assume that the robot is 
equipped with a circular array of 360 range finders, and that these finders 
never err. 


(a) What is the maximum number of distinct modes that a globally local- 
izing robot might encounter in its belief function? For n — 3,...,8, 
draw examples of worst-case environments, along with a plausible be- 
lief that maximizes the number of modes. 


(b) Does your analysis change if the range finders are allowed to err? In 
particular, can you give an example for n — 4 in which the number of 
plausible modes is larger than the ones derived above? Show such an 
environment along with the (erroneous) range scan and the posterior. 


3. You are requested to derive an EKF localization algorithm for a simplistic 
underwater robot. The robot lives in a 3-D space and is equipped with 
a perfect compass (it always knows its orientation). For simplicity, we 
assume the robot move independently in all three Cartesian directions (x, 
y, and z), by setting velocities 7,9,2. Its motion noise is Gaussian and 
independent for all directions. 


The robot is surrounded by a number of beacons that emit acoustic sig- 
nals. The emission time of each signal is known, and the robot can de- 
termine from each signal the identity of the emitting beacon (hence there 
is no correspondence problem). The robot also knows the location of all 
beacons, and it is given an accurate clock to measure the arrival time of 
each signal. However, the robot cannot sense the direction from which it 
received a signal. 


(a) You are asked to devise an EKF localization algorithm for this robot. 
This involves a mathematical derivation of the motion and the mea- 
surement model, along with the Taylor approximation. It also involves 
the statement of the final EKF algorithm, assuming known correspon- 
dence. 

(b) Implement your EKF algorithm and a simulation of the environment. 
Investigate the accuracy and the failure modes of the EKF localizer in 
the context of the three localization problems: global localization, po- 
sition tracking, and the kidnapped robot problem. 


4. Consider a simplified global localization in any of the following six grid- 
style environments: 
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In each environment, the robot will be placed at the random position fac- 
ing North. You are devised to come up with an open-loop localization 
strategy that contains a sequence of the following commands 


Action L: Turn left 90 degrees. 
Action R: Turn right 90 degrees. 
Action M: Move forward until you hit an obstacle. 


At the end of this strategy, the robot must be at a predictable location. 
For each such environment, provide a shortest such sequence (only “M” 
actions count). State where the robot will be after executing your action 
sequence. If no such sequence exists, explain why. 


. Now assume the robot can sense the number of steps it takes while exe- 


cuting an “M” action in the previous exercise. What will be the shortest 
open-loop sequence for the robot to determine its location? Explain your 
answer. 


Notice: For this question, it might happen that the final location of the 
robot is a function of its starting location. All we ask here is that the robot 
localize itself. 
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Introduction 


This chapter describes two localization algorithms that are capable of solv- 
ing global localization problems. The algorithms discussed here possess a 
number of differences to the unimodal Gaussian techniques discussed in the 
previous chapter. 


* They can process raw sensor measurements. There is no need to extract 
features from sensor values. As a direct implication, they can also process 
negative information. 


* They are non-parametric. In particular, they are not bound to a unimodal 
distribution as was the case with the EKF localizer. 


* They can solve global localization and—in some instances—kidnapped 
robot problems. The EKF algorithm is not able to solve such problems— 
although the MHT (multi-hypothesis tracking) can be modified so as to 
solve global localization problems. 


The techniques presented here have exhibited excellent performance in a 
number of fielded robotic systems. 

The first approach is called grid localization. It uses a histogram filter to 
represent the posterior belief. A number of issues arise when implement- 
ing grid localization: with a fine-grained grid, the computation required for 
a naive implementation may make the algorithm intolerably slow. With a 
coarse grid, the additional information loss through the discretization neg- 
atively affects the filter and—if not properly treated—may even prevent the 
filter from working. 
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1: Algorithm Grid_localization({p; 1-1}, we, zt, m): 
for allk do 
3: Pkt = >》 pini motion_model(mean(x,), u;, mean(x;)) 
4: Prt = Dy, measurement model(z;, mean(xy.), m) 
5: endfor 
6: return {pp t} 











Table 8.1 Grid localization, a variant of the discrete Bayes filter. The function mo- 
tion_model implements one of the motion models, and measurement_model a sen- 
sor model. The function “mean” returns the center-of-mass of a grid cell xx. 


The second approach is the Monte Carlo localization (MCL) algorithm, ar- 
guably the most popular localization algorithm to date. It uses particle filters 
to estimate posteriors over robot poses. A number of shortcomings of the 
MCL are discussed, and techniques for applying it to the kidnapped robot 
problem and to dynamic environments are presented. 


Grid Localization 


Basic Algorithm 


Grid localization approximates the posterior using a histogram filter over a grid 
decomposition of the pose space. The discrete Bayes filter was already exten- 
sively discussed in Chapter 4.1 and is depicted in Table 4.1. It maintains as 
posterior a collection of discrete probability values 


bel(az) = {pre} 


where each probability px is defined over a grid cell xk. The set of all grid 
cells forms a partition of the space of all legitimate poses: 


domain(X;) = xi4UXo4U...XKk, 


In the most basic version of grid localization, the partitioning of the space of 
all poses is time-invariant, and each grid cell is of the same size. A common 
granularity used in many of the indoor environments is 15 centimeters for 
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the x- and y-dimensions, and 5 degrees for the rotational dimension. A finer 
representation yields better results, but at the expense of increased computa- 
tion. 

Grid localization is largely identical to the basic histogram filter from 
which it is derived. Table 8.1 provides pseudo-code for the most basic im- 
plementation. It requires as input the discrete probability values {p:—1,x}, 
along with the most recent measurement, control, and the map. Its inner 
loop iterates through all grid cells. Line 3 implements the motion model 
update, and line 4 the measurement update. The final probabilities are nor- 
malized through the normalizer 7 in line 4. The functions motion model, 
and measurement model, may be implemented by any of the motion mod- 
els in Chapter 5, and measurement models in Chapter 6, respectively. The 
algorithm in Table 8.1 assumes that each cell possesses the same volume. 

Figure 8.1 illustrates grid localization in our one-dimensional hallway ex- 
ample. This diagram is equivalent to that of the general Bayes filter, except 
for the discrete nature of the representation. As before, the robot starts out 
with global uncertainty, represented by a uniform histogram. As it senses, 
the corresponding grid cells raise their probability values. The example high- 
lights the ability to represent multi-modal distributions with grid localiza- 
tion. 


Grid Resolutions 


A key variable of the grid localizer is the resolution of the grid. On the sur- 
face, this might appear to be a minor detail; however, the type of sensor 
model that is applicable, the computation involved in updating the belief, 
and the type results to expect all depend on the grid resolution. 

At the extreme end are two types of representations, both of which have 
been brought to bear successfully in fielded robotics systems. 

A common approach to defining a grid is topological; the resulting grids 
tend to be extremely coarse, and their resolution tends to be influenced by 
the structure of the environment. Topological representations decompose 
the space of all poses into regions that correspond to significant places in the 
environment. Such places may be defined by the presence (or absence) of 
specific landmarks, such as doors and windows. In hallway environments, 
places may correspond to intersections, T-junctions, dead ends, and so on. 
Topological representations tend to be coarse, and their environment decom- 
position depends on the structure of the environment. Figure 8.5 shows such 
a coarse representation for the one-dimensional hallway example. 
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(d) 


(e) 


Figure 8.1 Grid localization using a fine-grained metric decomposition. Each pic- 
ture depicts the position of the robot in the hallway along with its belief bel(z:), 


represented by a histogram over a grid. 
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Figure 8.2 Example of a fixed-resolution grid over the robot pose variables x, y, 
and 0. Each grid cell represents a robot pose in the environment. Different orien- 
tations of the robot correspond to different planes in the grid (shown are only three 
orientations). 


A much finer grained representation is commonly found through metric 
representations. Such representations decompose the state space into fine- 
grained cells of uniform size. The resolution of such decompositions is usu- 
ally much higher than that of topological grids. For example, some of the 
examples in Chapter 7 use grid decompositions with cell sizes of 15 centime- 
ters or less. Hence, they are more accurate, but at the expense of increased 
computational costs. Figure 8.2 illustrates such a fixed-resolution grid. Fine 
resolution like these are commonly associated with metric representation of 
space. 

When implementing grid localization for coarse resolutions, it is important 
to compensate for the coarseness in the resolution in the sensor and motion 
models. In particular, for a high-resolution sensor like a laser range finder, 
the value of the measurement model p(z; | x+) may vary drastically inside 
each grid cell xx. If this is the case, just evaluating it at the center-of-mass 
will generally yield poor results. Similarly, predicting robot motion from 
the center-of-mass may yield poor results: If the motion is updated in 1- 
second intervals for a robot moving at 10cm/sec, and the grid resolution is 1 
meter, the naive implementation will never result in a state transition! This 
is because any location that is approximately 10cm away from the center-of- 
mass of a grid cell still falls into the same grid cell. 
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Figure 8.3 Average localization error as a function of grid cell size, for ultrasound 
sensors and laser range-finders. 


A common way to compensate this effect is to modify both the measure- 
ment and the motion model by inflating the amount of noise. For example, 
the variance of a range finder model’s main Gaussian cone may be enlarged 
by half the diameter of the grid cell. In doing so, the new model is much 
smoother, and its interpretation will be less susceptible to the exact location 
of the sample point relative to the correct robot location. However, this mod- 
ified measurement model reduces the information extracted from the sensor 
measurements, 

Similarly a motion model may predict a random transition to a nearby cell 
with a probability that is proportional to the length of the motion arc, di- 
vided by the diameter of a cell. The result of such an inflated motion model 
is that the robot can indeed move from one cell to another, even if its mo- 
tion between consecutive updates is small relative to the size of a grid cell. 
However, the resulting posteriors are wrong in that an unreasonably large 
probability will be placed on the hypothesis that the robot changes cell at 
each motion update—and hence moves much faster than commanded. 

Figures 8.3 and 8.4 plot the performance of grid localization as a function 
of the resolution, for two different types of range sensors. As to be expected, 
the localization error increases as the resolution decreases. The total time 
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Figure 8.4 Average CPU-time needed for global localization as a function of grid 
resolution, shown for both ultrasound sensors and laser range-finders. 


necessary to localize a robot decreases as the grid becomes coarser, as shown 
in Figure 8.4. 


Computational Considerations 


When using a fine-grained grid such as some of the metric grids described in 
the previous section, the basic algorithm cannot be executed in real-time. At 
fault are both the motion and the measurement update. The motion update 
requires a convolution, which for a 3-D grid is a 6-D operation. The measure- 
ment update is a 3-D operation, but calculating the likelihood of a full scan 
is a costly operation. 

There exist a number of techniques to reduce the computational complex- 
ity of grid localization. Model pre-caching pays tribute to the fact that certain 
measurement models are expensive to compute. For example, the calculation 
of a measurement model may require ray casting, which can be pre-cached 
for any fixed map. As motivated in Chapter 6.3.4, a common strategy is to 
calculate for each grid cell essential statistics that facilitate the measurement 
update. In particular, when using a beam model, it is common to cache away 
the correct range for each grid cell. Further, the sensor model can be pre- 
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Figure 8.5 Application of a coarse-grained, topological representation to mobile 
robot localization. Each state corresponds to a distinctive place in the environment 
(a door in this case). The robot's belief bel(z;) of being in a state is represented by 
the size of the circles. (a) The initial belief is uniform over all poses. (b) shows the 
belief after the robot made one state transition and detected a door. At this point, it is 
unlikely that the robot is still in the left position. 


calculated for a fine-grained array of possible ranges. The calculation of the 
measurement model reduces then to two table lookups, which is much faster. 

Sensor subsampling achieves further speed-ups by evaluating the measure- 
ment model only for a subset of all ranges. In some of our systems, we use 
only 8 of our 360 laser range measurement and still achieve excellent results. 
Subsampling can take place spatially and in time. 

Delayed motion updates applies the motion update at lower frequency than 
the control or measurement frequency of the robot. This is achieved by ge- 
ometrically integrating the controls or odometry readings over a short time 
period. A good delayed motion update technique can easily speed up the 
algorithm by an order of magnitude. 

Selective updating was already described in Chapter 4.1.4. When updat- 
ing the grid, selective techniques update a fraction of all grid cells only. A 
common implementation of this idea updates only those grid cells whose 
posterior probability exceeds a user-specified threshold. Selective updating 
techniques can reduce the computational effort involved in updating beliefs 
by many orders of magnitude. Special care has to be taken to reactivate low- 
likelihood grid cells when one seeks to apply this approach to the kidnapped 
robot problem. 
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With these modifications, grid localization can in fact become quite effi- 
cient; even 10 years ago, low-end PCs were fast enough to generate the re- 
sults shown in this chapter. However, our modifications place an additional 
burden on the programmer and make a final implementation more complex 
than the short algorithm in Table 8.1 suggests. 


Illustration 


Figure 8.6 shows an example of Markov localization with metric grids, at a 
spatial resolution of 15 centimeters and an angular resolution of 5 degrees. 
Shown there is a global localization run where a mobile robot equipped with 
two laser range-finders localizes from scratch. The probabilistic model of the 
range-finders is computed by the beam model described in Chapter 6.3 and 
depicted in Table 8.1. 

Initially, the robot's belief is uniformly distributed over the pose space. 
Figure 8.6a depicts a scan of the laser range-finders taken at the start position 
of the robot. Here, max range measurements are omitted and the relevant 
part of the map is shaded in gray. After incorporating this sensor scan, the 
robot's location is focused on just a few regions in the (highly asymmetric) 
space, as shown by the gray-scale in Figure 8.6b. Notice that beliefs are pro- 
jected into x-y space; the true belief is defined over a third dimension, the 
robot's orientation 0, which is omitted in this and the following diagrams. 
Figure 8.6d shows the belief after the robot moved 2m, and incorporated the 
second range scan shown in Figure 8.6c. The certainty in the position esti- 
mation increases and the global maximum of the belief already corresponds 
to the true location of the robot. After integrating another scan into the belief 
the robot finally perceives the sensor scan shown in Figure 8.6e. Virtually all 
probability mass is now centered at the actual robot pose (see Figure 8.6f). 
Intuitively, we say that the robot successfully localized itself. This example 
illustrates that grid localization is capable to globally localize a robot effi- 
ciently. 

A second example is shown in Figure 8.7; see the figure caption for an ex- 
planation. Here the environment is partially symmetric, which causes sym- 
metric modes to appear in the localization process. 

Of course, global localization usually requires more than just a few sensor 
scans to succeed. This is particularly the case in symmetric environments, 
and if the sensors are less accurate than laser sensors. Figures 8.8 to 8.10 il- 
lustrate global localization using a mobile robot equipped with sonar sensors 
only, and in an environment that possesses many corridors of approximately 
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Figure 8.6 Global localization in a map using laser range-finder data. (a) Scan of 
the laser range-finders taken at the start position of the robot (max range readings are 
omitted). Figure (b) shows the situation after incorporating this laser scan, starting 
with the uniform distribution. (c) Second scan and (d) resulting belief. After integrat- 
ing the final scan shown in (e), the robot's belief is centered at its actual location (see 


(f). 
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(a) Path and reference poses (b) Belief at reference pose 1 











(c) Belief at reference pose 2 (d) Belief at reference pose 3 








(e) Belief at reference pose 4 (f) Belief at reference pose 5 














Figure 8.7 Global localization in an office environment using sonar data. (a) Path 
of the robot. (b) Belief as the robot passes position 1. (c) After some meters of robot 
motion, the robot knows that it is in the corridor. (d) As the robot reaches position 3 it 
has scanned the end of the corridor with its sonar sensors and hence the distribution is 
concentrated on two local maxima. While the maximum labeled I represents the true 
location of the robot, the second maximum arises due to the symmetry of the corridor 
(position II is rotated by 180° relative to position I). (e) After moving through Room 
A, the probability of being at the correct position I is now higher than the probability 
of being at position II. (f) Finally the robot's belief is centered on the correct pose. 
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Figure 8.8 Occupancy grid map of the 1994 AAAI mobile robot competition arena. 


the same width. An occupancy grid map is shown in Figure 8.8. Figure 8.9a 
shows the data set, obtained by moving along one of the corridors and then 
turning into another. Each of the measurement beams in Figure 8.9a corre- 
sponds to a sonar measurement. In this particular environment, the walls are 
smooth and a large fraction of sonar readings are corrupted. Again, the prob- 
abilistic model of the sensor readings is the beam-based model described in 
Chapter 6.3. Figure 8.9 additionally shows the belief for three different points 
in time, marked “A,” “B,” and “C” in Figure 8.9a. After moving approxi- 
mately three meters, during which the robot incorporates 5 sonar scans, the 
belief is spread almost uniformly along all corridors of approximately equal 
size, as shown in Figure 8.9b. A few seconds later, the belief is now focused 
on a few distinct hypotheses, as depicted in Figure 8.9c. Finally, as the robot 
turns around the corner and reaches the point marked “C,” the sensor data is 
now sufficient to uniquely determine the robot’s position. The belief shown 
in Figure 8.9d is now closely centered around the actual robot pose. This ex- 
ample illustrates that the grid representation works well for high-noise sonar 
data and in symmetric environments, where multiple hypotheses have to be 
maintained during global localization. 

Figure 8.10 illustrates the ability of the grid approach to correct accumu- 
lated dead-reckoning errors by matching sonar data with occupancy grid 
maps. Figure 8.10a shows the raw odometry data of a 240m long trajectory. 
Obviously, the rotational error of the odometry quickly increases. After trav- 
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Figure 8.9 (a) Data set (odometry and sonar range scans) collected in the environ- 
ment shown in Figure 8.8. This data set is sufficient for global localization using the 
grid localization. The beliefs at the points marked "A," "B" and "C" are shown in (b), 
(c), and (d). 


eling only 40m, the accumulated error in the orientation (raw odometry) is 
about 50 degrees. Figure 8.10b shows the path of the robot estimated by the 
grid localizer. 

Obviously, the resolution of the discrete representation is a key param- 
eter for grid Markov localization. Given sufficient computing and mem- 
ory resources, fine-grained approaches are generally preferable over coarse- 
grained ones. In particular, fine-grained approaches are superior to coarse- 
grained approaches, assuming that sufficient computing time and memory 
is available. As we already discussed in Chapter 2.44, the histogram repre- 
sentation causes systematic error that may violate the Markov assumption 
in Bayes filters. The finer the resolution, the less error is introduced, and the 
better the results. Fine-grained approximations also tend to suffer less from 
catastrophic failures where the robot's belief differs significantly from its actual 
position. 
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Figure 8.10 (a) Odometry information and (b) corrected path of the robot. 


Monte Carlo Localization 


We now turn our attention to a popular localization algorithm that represents 
the belief bel (x+) by particles. The algorithm is called Monte Carlo Localization, 
or MCL. Like grid-based Markov localization, MCL is applicable to both local 
and global localization problems. Despite its relatively young age, MCL has 
already become one of the most popular localization algorithms in robotics. 
It is easy to implement and tends to work well across a broad range of local- 
ization problems. 


Illustration 


Figure 8.11 illustrates MCL using the one-dimensional hallway example. The 
initial global uncertainty is achieved through a set of pose particles drawn at 
random and uniformly over the entire pose space, as shown in Figure 8.11a. 
As the robot senses the door, MCL assigns importance factors to each parti- 
cle. The resulting particle set is shown in Figure 8.11b. The height of each 
particle in this figure shows its importance weight. It is important to notice 
that this set of particles is identical to the one in Figure 8.11a—the only thing 
modified by the measurement update are the importance weights. 

Figure 8.11c shows the particle set after resampling and after incorporat- 
ing the robot motion. This leads to a new particle set with uniform impor- 
tance weights, but with an increased number of particles near the three likely 
places. The new measurement assigns non-uniform importance weights to 
the particle set, as shown in Figure 8.11d. At this point, most of the cu- 
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Figure 8.11 Monte Carlo Localization, a particle filter applied to mobile robot local- 
ization. 
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Algorithm MCL(YU 1, uz, zt, m): 
X= =90 
for m = 1 to M do 


Eu = sample motion model(u;, anl) 


wl = measurement model(z;, ol"), m) 
A, = A + (al), wi!) 
endfor 
form = 1 to M do 
draw i with probability cx wi! 
add x” to x, 
endfor 
return X; 
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Table 8.2 MCL, or Monte Carlo Localization, a localization algorithm based on par- 
ticle filters. 


mulative probability mass is centered on the second door, which is also the 
most likely location. Further motion leads to another resampling step, and a 
step in which a new particle set is generated according to the motion model 
(Figure 8.11e). As should be obvious from this example, the particle sets ap- 
proximate the correct posterior, as would be calculated by an exact Bayes 
filter. 


The MCL Algorithm 


Table 8.2 shows the basic MCL algorithm, which is obtained by substituting 
the appropriate probabilistic motion and perceptual models into the algo- 
rithm particle filters (Table 4.3 on page 98). The basic MCL algorithm repre- 
sents the belief bel(z,) by a set of M particles A, = fal! ; all sees ol) }. Line 4 
in our algorithm (Table 8.2) samples from the motion model, using particles 
from present belief as starting points. The measurement model is then ap- 
plied in line 5 to determine the importance weight of that particle. The initial 
belief bel(xo) is obtained by randomly generating M such particles from the 
prior distribution p(xo), and assigning the uniform importance factor M^! 
to each particle. As in grid localization, the functions motion model, and 
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measurement_model, may be implemented by any of the motion models in 
Chapter 5, and measurement models in Chapter 6, respectively. 


Physical Implementations 


It is straightforward to implement the MCL algorithm for the landmark- 
based localization scenario of Chapter 7. To do so, the sam- 
pling procedure in line 4 is implemented using the algorithm sam- 
ple motion model velocity given in Table 5.3. The algorithm land- 
mark model known correspondence given in Table 6.4 provides the likeli- 
hood model used in line 5 to weigh the predicted samples. 

Figure 8.12 illustrates this version of the MCL algorithm. The scenario is 
identical to the one shown in Figure 7.15. For convenience, the illustration 
of the robot path and measurements is shown again in the upper left figure. 
The lower plot shows a sequence of sample sets generated by the MCL al- 
gorithm. The solid line represents the true path of the robot, the dotted line 
represents the path based on the control information, and the dashed line 
represents the mean path estimated by the MCL algorithm. Predicted sam- 
ple sets A at different points in time are shown in dark, the samples +; after 
the resampling steps are shown in lighter gray. Each particle set is defined 
over the 3-dimensional pose space, although only the x- and y-coordinates 
of each particle are shown. The means and covariances extracted from these 
sets are shown in the upper right figure. 

Figure 8.13 shows the result of applying MCL in an actual office envi- 
ronment, for a robot equipped with an array of sonar range finders. This 
version of MCL computes the likelihood of measurements using algorithm 
beam range finder model given in Table 6.1. The figure depicts particle 
sets after 5, 28, and 55, meters of robot motion, respectively. A third illus- 
tration is provided in Figure 8.14, here using a camera pointed towards the 
ceiling, and a measurement model that relates the brightness in the center of 
the image to a previously acquired ceiling map. 








Properties of MCL 


MCL can approximate almost any distribution of practical importance. It is 
not bound to a limited parametric subset of distributions, as was the case for 
EKF localization. Increasing the total number of particles increases the accu- 
racy of the approximation. The number of particles M is a parameter that 
enables the user to trade off the accuracy of the computation and the compu- 
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Figure 8.12 The MCL algorithm for landmark-based localization. (a) Robot trajec- 
tory according to the motion control (dashed lines) and the resulting true trajectory 
(solid lines). Landmark detections are indicated by thin lines. (b) Covariances of 
sample sets before and after resampling. (c) Sample sets before and after resampling. 


tational resources necessary to run MCL. A common strategy for setting M 
is to keep sampling until the next pair w and z; has arrived. In this way, the 
implementation is adaptive with regards to the computational resources: the 
faster the underlying processor, the better the localization algorithm. How- 
ever, as we will see in Chapter 8.3.7, care has to be taken that the number of 
particles remains high enough to avoid filter divergence. 

A final advantage of MCL pertains to the non-parametric nature of the 
approximation. As our illustrative results suggest, MCL can represent com- 
plex multi-modal probability distributions, and blend them seamlessly with 
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Figure 8.13 Illustration of Monte Carlo localization: Shown here is a robot operating 
in an office environment of size 54m x 18m. (a) After moving 5m, the robot is still 
globally uncertain about its position and the particles are spread through major parts 
of the free-space. (b) Even as the robot reaches the upper left corner of the map, its 
belief is still concentrated around four possible locations. (c) Finally, after moving 
approximately 55m, the ambiguity is resolved and the robot knows where it is. All 
computation is carried out in real-time on a low-end PC. 
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Figure 8.14 Global localization using a camera pointed at the ceiling. 


focused Gaussian-style distributions. 


Random Particle MCL: Recovery from Failures 


MCL, in its present form, solves the global localization problem but cannot 
recover from robot kidnapping, or global localization failures. This is quite 
obvious from the results in Figure 8.13: As the position is acquired, particles 
at places other than the most likely pose gradually disappear. At some point, 
particles only “survive” near a single pose, and the algorithm is unable to 
recover if this pose happens to be incorrect. 

This problem is significant. In practice, any stochastic algorithm such as 
MCL may accidentally discard all particles near the correct pose during the 
resampling step. This problem is particularly paramount when the number 
of particles is small (e.g., M = 50), and when the particles are spread over a 
large volume (e.g., during global localization). 

Fortunately, this problem can be solved by a rather simple heuristic. The 
idea of this heuristic is to add random particles to the particle sets, as already 
discussed in Chapter 4.3.4. Such an injection of random particles can be justi- 
fied mathematically by assuming that the robot might get kidnapped with a 
small probability, thereby generating a fraction of random states in the mo- 
tion model. Even if the robot does not get kidnapped, however, the random 
particles add an additional level of robustness. 

The approach of adding particles raises two questions. First, how many 
particles should be added at each iteration of the algorithm and, second, from 
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which distribution should we generate these particles? One might add a 
fixed number of random particles at each iteration. A better idea is to add 
particles based on some estimate of the localization performance. 

One way to implement this idea is to monitor the probability of sensor 
measurements 


plz: | 21:t—15 U1:t, m) 


and relate it to the average measurement probability (which is easily learned 
from data). In particle filters, an approximation to this quantity is easily ob- 
tained from the importance factor, since, by definition, an importance weight 
is a stochastic estimate of this probability. The average value 


lO^ f 
M 5 Ut & plz | 214-1, uam) 
m=1 

approximates the desired probability as stated. It is usually a good idea to 
smooth this estimate by averaging it over multiple time steps. There exist 
multiple reasons why the measurement probability may be low, besides a 
localization failure. The amount of sensor noise might be unnaturally high, 
or the particles may still be spread out during a global localization phase. 
For these reasons, it is a good idea to maintain a short-term average of the 
measurement likelihood, and relate it to the long-term average when deter- 
mining the number of random samples. 

The second problem of determining which sample distribution to use can 
be addressed in two ways. One can draw particles according to a uniform 
distribution over the pose space and weight them with the current observa- 
tion. 

For some sensor models, however, it is possible to generate particles di- 
rectly in accordance to the measurement distribution. One example of such 
a sensor model is the landmark detection model discussed in Chapter 6.6. 
In this case the additional particles can be placed directly at locations dis- 
tributed according to the observation likelihood (see Table 6.5). 

Table 8.3 shows a variant of the MCL algorithm that adds random par- 
ticles. This algorithm is adaptive, in that it tracks the short-term and the 
long-term average of the likelihood p(z; | 2141, ui, m). Its first part is iden- 
tical to the algorithm MCL in Table 8.2: New poses are sampled from old 
particles using the motion model (line 5), and their importance weight is set 
in accordance to the measurement model (line 6). 

Augmented_MCL calculates the empirical measurement likelihood in line 
8, and maintains short-term and long-term averages of this likelihood in lines 
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1 Algorithm Augmented_MCL(4;_1, ut, zt, m): 

2 static Wslow, Wfast 

3 Æ, = Xi — 0 

4 form = 1 to M do 

5: Eu — sample motion model(u;, ail) 

6 wl”! = measurement model(z;, ol) m) 

7 y= X+ (al), wl!) 

8 Wave = Wave 十 十 wl 

9 endfor 

10: Wslow = Wslow + slow (Wave Ew Wslow ) 

11: Wfast = fast + O'fast (Wave x Ufast) 

12: for m = 1 to M do 

13: with probability max{0.0, 1.0 — wrast /Wslow} do 
14: add random pose to X; 

15: else 

16: draw i € (1,..., N} with probability œ wf 
17: add x!” to x, 

18: endwith 

19: endfor 

20: return A; 











Table 8.3 An adaptive variant of MCL that adds random samples. The number 
of random samples is determined by comparing the short-term with the long-term 
likelihood of sensor measurements. 


10 and 11. The algorithm requires that 0 € aslow < os. The parameters 
Aslow, aNd fast, are decay rates for the exponential filters that estimate the 
long-term, and short-term, averages, respectively. The crux of this algorithm 
can be found in line 13: During the resampling process, a random sample is 
added with probability 


max{0.0, 1.0 一 Whast / Wslow } 


Otherwise, resampling proceeds in the familiar way. The probability of 
adding a random sample takes into consideration the divergence between 
the short- and the long-term average of the measurement likelihood. If the 
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short-term likelihood is better or equal to the long-term likelihood, no ran- 
dom sample is added. However, if the short-term likelihood is worse than 
the long-term one, random samples are added in proportion to the quotient 
of these values. In this way, a sudden decay in measurement likelihood in- 
duces an increased number of random samples. The exponential smoothing 
counteracts the danger of mistaking momentary sensor noise for a poor lo- 
calization result. 

Figure 8.16 illustrates our augmented MCL algorithm in practice. Shown 
there is a sequence of particle sets during global localization and relocal- 
ization of a legged robot equipped with a color camera, and operating on 
a 3x2m field as it was used in RoboCup soccer competitions. Sensor mea- 
surements correspond to the detection and relative localization of six visual 
markers placed around the field, as shown in Figure 7.7 on page 210. The 
algorithm described in Table 6.4 is used to determine the likelihood of detec- 
tions. Step 14 in Figure 8.3 is replaced by an algorithm for sampling accord- 
ing to the most recent sensor measurement, which is easily implemented 
using algorithm sample landmark model known correspondence in Ta- 
ble 6.5. 

Panels (a) through (d) in Figure 8.16 illustrate global localization. At the 
first marker detection, virtually all particles are drawn according to this de- 
tection (Figure 8.16b). This step corresponds to a situation in which the 
short-term average of the measurement probability is much worse than its 
long-term correspondent. After several more detections, the particles are 
clustered around the true robot position (Figure 8.16d), and both the short- 
and long-term average of the measurement likelihood increases. At this stage 
of localization, the robot is merely tracking its position, the observation like- 
lihoods are rather high, and only a very small number of random particles 
are occasionally added. 

As the robot is physically placed at a different location by a referee—a 
common event in robotic soccer tournaments—the measurement probability 
drops. The first marker detection at this new location does not yet trigger 
any additional particles, since the smoothed estimate wr, is still high (see 
Figure 8.16e). After several marker detections observed at the new location, 
Wfast decreases much faster than wslow and more random particles are added 
(Figure 8.16f&g). Finally, the robot successfully relocalizes itself as shown in 
Figure 8.16h, demonstrating that our augmented MCL algorithm is indeed 
capable of "surviving" the kidnapping. 
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Figure 8.16 Monte Carlo localization with random particles. Each picture shows a 
particle set representing the robot's position estimate (small lines indicate the orien- 
tation of the particles). The large circle depicts the mean of the particles, and the true 
robot position is indicated by the small, white circle. Marker detections are illustrated 
by arcs centered at the detected marker. The pictures illustrate global localization (a)- 


(d) and relocalization (e)-(h). 
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Figure 8.17 (a) plain MCL (top curve), MCL with random samples (center curve), 
and Mixture MCL with mixture proposal distribution (bottom curve). The error rate 
is measured in percentage of time during which the robot lost track of its position, for 
a data set acquired by a robot operating in a crowded museum. (b) Error as a function 
of time for standard MCL and mixture MCL, using a ceiling map for localization. 


8.3. Modifying the Proposal Distribution 


The MCL proposal mechanism is another source that can render MCL ineffi- 
cient. As discussed in Chapter 4.3.4, the particle filter uses the motion model 
as proposal distribution, but it seeks to approximate a product of this distri- 
bution and the perceptual likelihood. The larger the difference between the 
proposal and the target distribution, the more samples are needed. 

In MCL, this induces a surprising failure mode: If we were to acquire a 
perfect sensor that—without any noise—always informs the robot of its cor- 
rect pose, MCL would fail. This is even true for noise-free sensors that do 
not carry sufficient information for localization. An example of the latter 
would be a 1-D noise-free range sensor: When receiving such a range mea- 
surement, the space of valid pose hypotheses will be a 2-D subspace of the 
3-D pose space. We already discussed in length in Chapter 4.3.4 that chances 
to sample into this 2-D submanifold are zero when sampling from the robot 
motion model. Thus, we face the strange situation that under certain circum- 
stances, a less accurate sensor would be preferable to a more accurate sensor 
when using MCL for localization. This is not the case for EKF localization, 
since the EKF update takes the measurements into account when calculating 
the new mean—instead of generating mean(s) from the motion model alone. 

Luckily, a simple trick provides remedy: Simply use a measurement model 
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that artificially inflates the amount of noise in the sensor. One can think of 
this inflation as accommodating not just the measurement uncertainty, but 
also the uncertainty induced by the approximate nature of the particle filter 
algorithm. 

An alternative, more sound solution involves a modification of the sam- 
pling process which we already discussed briefly in Chapter 4.3.4. The idea 
is that for a small fraction of all particles, the role of the motion model and 
the measurement model are reversed: Particles are generated according to 
the measurement model 


al”! o plz | a) 


and the importance weight is calculated in proportion to 


wl) = [et | Ut, t-1) bel (x41) dz, 4 


This new sampling process is a legitimate alternative to the plain particle 
filter. It alone will be inefficient since it entirely ignores the history when gen- 
erating particles. However, it is equally legitimate to generate a fraction of 
the particles with either of those two mechanisms and merge the two particle 
sets. The resulting algorithm is called MCL with mixture proposal distribution, 
or Mixture MCL. In practice, it tends to suffice to generate a small fraction of 
particles (e.g., 5%) through the new process. 

Unfortunately, our idea does not come without challenges. The two main 
steps—sampling from p(z | zt) and calculating the importance weights 
wll —can be difficult to realize. Sampling from the measurement model is 
only easy if its inverse possesses a closed form solution from which it is easy 
to sample. This is usually not the case: imagine sampling from the space of 
all poses that fit a given laser range scan! Calculating the importance weights 
is complicated by the integral in (8.7), and by the fact that bel(x,_1) is itself 
represented by a set of particles. 

Without delving into too much detail, we note that both steps can be im- 
plemented, but only with additional approximations. Figure 8.17 shows 
comparative results for MCL, MCL augmented with random samples, and 
Mixture MCL for two real-world data sets. In both cases, p(z; | x+) was itself 
learned from data and represented by a density tree—an elaborate procedure 
whose description is beyond the scope of this book. For calculating the im- 
portance weights, the integral was replaced by a stochastic integration, and 
the prior belief was continued into a space-filling density by convolving each 
particle with a narrow Gaussian. Details aside, these results illustrate that the 
mixture idea yields superior results, but it can be challenging to implement. 
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We also note that the Mixture MCL provides a sound solution to the kid- 
napped robot problem. By seed-starting particles using the most recent mea- 
surement only, we constantly generate particles at locations that are plausi- 
ble given the momentary sensor input, regardless of past measurements and 
controls. There exist ample evidence in the literature that such approaches 
can cope well with total localization failure (Figure 8.17b happens to show 
one such failure for regular MCL), hence provides improved robustness in 
practical implementations. 


KLD-Sampling: Adapting the Size of Sample Sets 


The size of the sample sets used to represent beliefs is an important parame- 
ter for the efficiency of particle filters. So far we only discussed particle filters 
that use sample sets of fixed size. Unfortunately, to avoid divergence due to 
sample depletion in MCL, one has to choose large sample sets so as to allow 
a mobile robot to address both the global localization and the position track- 
ing problem. This can be a waste of computational resources, as Figure 8.13 
reveals. In this example, all sample sets contain 100,000 particles. While such 
a high number of particles might be necessary to accurately represent the be- 
lief during early stages of localization (cf. Figure 8.13a), it is obvious that only 
a small fraction of this number suffices to track the position of the robot once 
it knows where it is (Figure 8.13c). 

KLD-sampling is a variant of MCL that adapts the number of particles over 
time. We do not provide a mathematical derivation of KLD-sampling, but 
only state the algorithm and show some experimental results. The name 
KLD-sampling is derived from the Kullback-Leibler divergence, which is a mea- 
sure of the difference between two probability distributions. The idea behind 
KLD-sampling is to determine the number of particles based on a statistical 
bound on the sample-based approximation quality. More specifically, at each 
iteration of the particle filter, KLD-sampling determines the number of sam- 
ples such that, with probability 1 — à, the error between the true posterior 
and the sample-based approximation is less than e. Several assumptions not 
stated here make it possible to derive an efficient implementation of this idea. 

The KLD-sampling algorithm is shown in Table 8.4. The algorithm takes as 
input the previous sample set along with the map and the most recent control 
and measurement. In contrast to MCL, KLD-sampling takes a weighted sam- 
ple set as input. That is, the samples in X. ; are not resampled. Additionally, 
the algorithm requires the statistical error bounds £ and ô. 

In a nutshell, KLD-sampling generates particles until the statistical bound 
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Algorithm KLD Sampling MCL(%;_1, Ut, zt, m, €, Ô): 
A-f0 
M =0, My =0,k =0 
for allb in H do 


b = empty 


endfor 


do 


while M < My or M < My 


draw i with probability cx wl), 








ol) 三 sample motion model(u,, z"! ) 
wl! = measurement model(z ol) m) 
t aa — tot $ 
Xi = Xy + (a), wll) 
if íi falls into empty binb then 
k=k+1 
b = non-empty 
if k>1 then 
3 
My : Ezt {1 SU Ta ed 
endif 
M=M+1 


min 


return X; 








Table8.4 KLD-sampling MCL with adaptive sample set size. The algorithm gener- 
ates samples until a statistical bound on the approximation error is reached. 


in line 16 is satisfied. This bound is based on the "volume" of the state space 
that is covered by particles. The volume covered by particles is measured 
by a histogram, or grid, overlayed over the 3-dimensional state space. Each 
bin in the histogram H is either empty or occupied by at least one particle. 
Initially, each bin is set to empty (Lines 4 through 6). In line 8, a particle is 
drawn from the previous sample set. Based on this particle, a new particle is 
predicted, weighted, and inserted into the new sample set (Lines 9-11, just 
like in MCL). 

Lines 12 through 19 implement the key idea of KLD-sampling. If the newly 
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generated particle falls into an empty bin of the histogram, then the number k 
of non-empty bins is incremented and the bin is marked as non-empty. Thus, 
k measures the number of histogram bins filled with at least one particle. 
This number plays a crucial role in the statistical bound determined in line 16. 
The quantity M, gives the number of particles needed to reach this bound. 
Note that for a given e, M, is mostly linear in the number k of non-empty 
bins; the second, nonlinear term becomes negligible as k increases. The term 
21—5 is based on the parameter ô. It represents the upper 1 — ô quantile of the 
standard normal distribution. The values of 2 -5 for typical values of 6 are 
readily available in standard statistical tables. 

The algorithm generates new particles until their number M exceeds M, 
and a user-defined minimum My: As can be seen, the threshold M, serves 
as a moving target for M. The more samples M are generated, the more bins 
k in the histogram are non-empty, and the higher the threshold My. 

In practice, the algorithm terminates based on the following reasoning. In 
the early stages of sampling, k increases with almost every new sample since 
virtually all bins are empty. This increase in k results in an increase in the 
threshold My. However, over time, more and more bins are non-empty and 
My increases only occasionally. Since M increases with each new sample, M 
will finally reach M, and sampling is stopped. When this happens depends 
on the belief. The more widespread the particles, the more bins are filled and 
the higher the threshold M... During tracking, KLD-sampling generates less 
samples since the particles are concentrated on a small number of different 
bins. It should be noted that the histogram has no impact on the particle 
distribution itself. Its only purpose is to measure the complexity, or volume, 
of the belief. The grid is discarded at the end of each particle filter iteration. 

Figure 8.18 shows the sample set sizes during a typical global localization 
run using KLD-sampling. The figure shows graphs when using a robot's 
laser range-finder (solid line) or ultrasound sensors (dashed line). In both 
cases, the algorithm chooses a large number of samples during the initial 
phase of global localization. Once the robot is localized, the number of parti- 
cles drops to a much lower level (less than 176 of the initial number of parti- 
cles). When and how fast this transition from global localization to position 
tracking happens depends on the type of the environment and the accuracy 
of the sensors. In this example, the higher accuracy of the laser range-finder 
is reflected by an earlier transition to a lower level. 

Figure 8.19 shows a comparison between the approximation error of KLD- 
sampling and MCL with fixed sample sets. The approximation error is mea- 
sured by the Kullback-Leibler distance between the beliefs (sample sets) gen- 
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Figure 8.18 KLD-sampling: Typical evolution of number of samples for a global 
localization run, plotted against time (number of samples is shown on a log scale). 
The solid line shows the number of samples when using the robot's laser range-finder, 
the dashed graph is based on sonar sensor data. 


erated with varying numbers of samples and the "optimal" beliefs. These 
"optimal" beliefs were generated by running MCL with sample sets of size 
200,000, which is far more than actually needed for position estimation. As 
expected, the more samples are used by the two approaches, the smaller the 
approximation error. The dashed graph shows the results achieved by MCL 
with different sample set sizes. As can be seen, the fixed approach requires 
about 50,000 samples before it converges to a KL-distance below 0.25. Larger 
errors typically indicate that the particle filter diverges and the robot is not 
able to localize. The solid line shows the results when using KLD-sampling. 
Here the sample set sizes are averages over the global localization runs. The 
different data points were obtained by varying the error bound e between 0.4 
and 0.015, decreasing from left to right. KLD-sampling converges to a small 
error level using only 3,000 samples on average. The graph also shows that 
KLD-sampling is not guaranteed to accurately track the optimal belief. The 
leftmost data points on the solid line indicate that KLD-sampling diverges 
due to too loose error bounds. 

KLD-sampling can be used by any particle filter, not just MCL. The his- 
togram can be implemented either as a fixed, multi-dimensional grid, or 
more compactly as tree structures. In the context of robot localization, KLD- 
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Figure 8.19 Comparison of KLD-sampling and MCL with fixed sample set sizes. 
The a-axis represents the average sample set size. The y-axis plots the KL-distance 
between the reference beliefs and the sample sets generated by the two approaches. 


sampling has been shown to consistently outperform MCL with fixed sample 
set sizes. The advantage of this technique is most significant for a combina- 
tion of global localization and tracking problems. In practice, good results 
are achieved with error bound values around 0.99 for (1 — 6) and 0.05 for < 
in combination with histogram bin sizes of 50cm x 50cm x 15deg. 


Localization in Dynamic Environments 


A key limitation of all localization algorithms discussed thus far arises from 
the static world assumption, or Markov assumption. Most interesting en- 
vironments are populated by people, and hence exhibit dynamics not mod- 
eled by the state z,. To some extent, probabilistic approaches are robust to 
such unmodeled dynamics, due to their ability to accommodate sensor noise. 
However, as previously noted, the type of sensor noise accommodated in 
the probabilistic filtering framework must be independent at each time step, 
whereas unmodeled dynamics induce effects on the sensor measurements 
over multiple time steps. When such effects are paramount, probabilistic 
localization algorithms that rely on the static world assumption may fail. 

A good example of such a failure situation is shown in Figure 8.20. This 
example involves a mobile tour-guide robot, navigating in museums full of 
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Figure 8.20 Scenes from the "Deutsches Museum Bonn," where the mobile robot 
“Rhino” was frequently surrounded by dozens of people. 
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Figure 8.21 Laser range scans are often heavily corrupted when people surround 
the robot. How can a robot maintain accurate localization under such circumstances? 


people. The people—their locations, velocities, intentions etc.—are hidden 
state relative to the localization algorithm that is not captured in the algo- 
rithms discussed thus far. Why is this problematic? Imagine people lining 
up in a way that suggests the robot is facing a wall. With each single sen- 
sor measurement the robot increases its belief of being next to a wall. Since 
information is treated as independent, the robot will ultimately assign high 
likelihood to poses near walls. Such an effect is possible with independent 
sensor noise, but its likelihood is vanishingly small. 
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There exist two fundamental techniques for dealing with dynamic envi- 
ronments. The first technique, state augmentation includes the hidden state 
into the state estimated by the filter; the other, outlier rejection pre-processes 
sensor measurements to eliminate measurements affected by hidden state. 
The former methodology is mathematically the more general one: Instead 
of just estimating the robot’s pose, one can define a filter that also estimates 
people’s positions, their velocities, etc. In fact, we will later on discuss such 
an approach, as an extension to a mobile robot mapping algorithm. 

The principle disadvantage of estimating the hidden state variables lies 
in its computational complexity: Instead of estimating 3 variables, the robot 
must now calculate posteriors over a much larger number of variables. In 
fact, the number of variables itself is a variable, as the number of people 
may vary over time. Thus, the resulting algorithm will be substantially more 
involved than the localization algorithms discussed thus far. 

The alternative, outlier rejection, works well in certain limited situations, 
which includes situations where people’s presence may affect range finders 
or (to a lesser extent) camera images. Here we develop it for the beam-based 
range finder model from Chapter 6.3. 

The idea is to investigate the cause of a sensor measurement, and to reject 
those likely to be affected by unmodeled environment dynamics. The sensor 
models discussed thus far all address different, alternative ways by which 
a measurement can come into existence. If we manage to associate specific 
ways with the presence of unwanted dynamic effects—such as people—all 
we have to do is to discard those measurements that are with high likelihood 
caused by such an unmodeled entity. 

This idea is surprisingly general; and in fact, the mathematics are essen- 
tially the same as in the EM learning algorithm in Chapter 6.3, but applied in 
an online fashion. In Equation(6.12), Chapter 6.3, we defined the beam-based 
measurement model for range finders as a mixture of four terms: 


T 
Zhit Phi (27 | z,,m) 
k 
k Zshort Pshort (27 | Tt, m) 
Z. TMm = 2 
P( i | a ) Zmax Dinas(en | z,,m) 
Zrand Prana (zË | zt, m) 


As our derivation of the model clearly states, one of those terms, the one 
involving Zshort and Pshort, corresponds to unexpected objects. To calculate 
the probability that a measurement z7 corresponds to an unexpected object, 
we have to introduce a new correspondence variable, cF which can take on 
one of the four values (hit, short, max, rand]. 
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The posterior probability that the range measurement 次 corresponds to a 
“short” reading—our mnemonic from Chapter 6.3 for unexpected obstacle— 
is then obtained by applying Bayes rule and subsequently dropping irrele- 
vant conditioning variables: 


per = short | 2 Z1:t—15 u1;,m) 


p(zt | e = short, Z1:t—1; urt, m) pe = short | Zl:t—1: ur: mM) 





3v | of = 6 zii, uem) D( = c | 214-1, uns m) 
c 


p(zt | 2 一 Short， Z1:t—-1) wr m) pe = short) 





se | a = e 21-1, uc m) P(e = c) 
c 


Here the variable c in the denominator takes on any of the four values 
(hit, short, max, rand}. Using the notation in Equation (8.8), the prior p(c? = 
c) is given by the variables znit, Zshort, Zmax; ANd Zrana, for the four values of 
c. The remaining probability in (8.9) is obtained by integrating out z: 


p(zt | p? = €, 21:t-1, U1:t; m) 

= [oe | Te, CË =C, 21-1) U1:t, M) p(zi | e =C, Z1:t—1, U1:t, M) dai 

= [oe | Te, Cf =C, m) p(zi | Z1:t—1; 1:4, M) dat 

k E TT 

= [oe | x4, 0? = c, m) bel(a,) da 
Probabilities of the form p(zt | ze, gh = c, m) were abbreviated as puit, Pshort, 
Pmax, ANd Prana in Chapter 6.3. This gives us the expression for desired prob- 
ability (8.9): 


f mtt | Lt, m) short bel(a;) dx, 


[Erd | x4,m) ze bel(z,) da, 





pe = short | g Zl1:u—1s U14, m) = 


In general, the integrals in (8.11) do not possess closed-form solutions. To 
evaluate them, it suffices to approximate them with a representative sam- 
ple of the posterior bel(x,) over the state z,. Those samples might be high- 
likelihood grid cells in grid localizer, or particles in a MCL algorithm. The 
measurement is then rejected if its probability of being caused by an unex- 
pected obstacle exceeds a user-selected threshold x. 
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Algorithm test range measurement(z7, A,, m): 
p=q=0 
form = 1 to M do 
p = p T Zshort * Dshort (27 | al” 
q — + Zuit ` Prit (27 | ol, m) + Zshort * Pshort (27 | al” 
+z | p (2f | ol”, m) + Zrand * Brana (Ze | al”! qm) 
max max \“¢ t T t 
endfor 
ifp/q € x then 
return accept 
10: else 
11 return reject 
12 endif 


] 


,m) 


] 





,m) 








Table8.5 Algorithm for testing range measurements in dynamic environment. 


Table 8.5 depicts an implementation of this technique in the context of par- 
ticle filters. It requires as input a particle set A representative of the belief 
bel(zi), along with a range measurement n and a map. It returns "reject" 
if with probability larger than x the measurement corresponds to an unex- 
pected object; otherwise it returns "accept." This routine precedes the mea- 
surement integration step in MCL. 

Figure 8.22 illustrates the effect of the filter. Shown in both panels are a 
range scan, for a different alignment of the robot pose. The lightly shaded 
scans are above threshold and rejected. A key property of our rejection mech- 
anism is that it tends to filter out measurements that are "surprisingly" short, 
but leaves others in place that are "surprisingly" long. This asymmetry re- 
flects the fact that people's presence tends to cause shorter-than-expected 
measurements. By accepting surprisingly long measurements, the approach 
maintains its ability to recover from global localization failures. 

Figure 8.23 depicts an episode during which a robot navigates through an 
environment that is densely populated with people (see Figure 8.21). Shown 
there is the robot's estimated path along with the endpoints of all scans in- 
corporated into the localizer. This figure shows the effectiveness of remov- 
ing measurements that do not correspond to physical objects in the map: 
there are very few "surviving" range measurements in the freespace for the 
right diagram, in which measurements are accepted only if they surpass the 
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(a) (b) 








Figure 8.22 Illustration of our measurement rejection algorithm: Shown in both di- 
agrams are range scans (no max-range readings). Lightly shaded readings are filtered 
out. 


threshold test. 

As a rule of thumb, outlier rejection of measurements is generally a good 
idea. There exist almost no static environments; even in office environments 
furniture is moved, doors are opened/closed, etc. Our specific implementa- 
tion here benefits from the asymmetry of range measurements: people make 
measurements shorter, not longer. When applying the same idea to other 
data (e.g., vision data) or other types of environment modifications (e.g., the 
removal of a physical obstacle), such an asymmetry might not exist. Never- 
theless, the same probabilistic analysis is usually applicable. The disadvan- 
tage of the lack of such a symmetry might be that it becomes impossible to 
recover from global localization failures, as every surprising measurement is 
rejected. In such cases, it may make sense to impose additional constraints, 
such as a limit on the fraction of measurements that may be corrupted. 

We note that the rejection test has found successful application even in 
highly static environments, for reasons that are quite subtle. The beam-based 
sensor model is discontinuous: Small changes of pose can drastically alter 
the posterior probability of a sensor measurement. This is because the result 
of ray casting is not a continuous function in pose parameters such as the 
robot orientation. In environment with cluttered objects, this discontinuity 
increases the number of particles necessary for successful localization. By 
manually removing clutter from the map—and instead letting the filter man- 
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Figure 8.23 Comparison of (a) standard MCL and (b) MCL with the removal of 
sensor measurements likely caused by unexpected obstacles. Both diagrams show 
the robot path and the end-points of the scans used for localization. 


age the resulting “surprisingly” short measurements—the number of par- 
ticles can be reduced drastically. The same strategy does not apply to the 
likelihood field model, since this model is smooth in the pose parameters. 


Practical Considerations 


Table 8.6 summarizes and compares the main localization techniques dis- 
cussed in this and the previous chapter. When choosing a technique, a num- 
ber of requirements have to be traded off. A first question will always be 
whether it is preferable to extract features from sensor measurements. Ex- 
tracting features may be beneficial from a computational perspective, but it 
comes at the price of reduced accuracy and robustness. 

While in this chapter, we discussed techniques for handling dynamic envi- 
ronments in the context of the MCL algorithm, similar ideas can be brought 
to bear with other localization techniques as well. In fact, the techniques 
discussed here are only representative of a much richer body of approaches. 

When implementing a localization algorithm, it is worthwhile to play with 
the various parameter settings. For example, the conditional probabilities are 
often inflated when integrating nearby measurements, so as to accommo- 
date unmodeled dependencies that always exist in robotics. A good strategy 
is to collect reference data sets, and tune the algorithm until the overall re- 


5rjs.cn 000000 


274 


8.6 


8 Mobile Robot Localization: Grid And Monte Carlo 
























































EKF MHT Coarse fine (metric) | MCL 
(topologi- grid 
cal) grid 
Measurements landmarks landmarks landmarks raw mea- | raw mea- 
surements surements 
Measurement Gaussian Gaussian any any any 
noise 
Posterior Gaussian mixture of | histogram histogram particles 
Gaussians 
Efficiency 十 十 十 十 十 - 十 
(memory) 
Efficiency ++ 十 Æ p TS 
(time) 
Ease of imple- 十 十 二 Sep 
mentation 
Resolution 十 十 平平 z ah i 
Robustness = + + 十 十 十 十 
Global local- || no yes yes yes yes 
ization 





























Table 8.6 Comparison of different implementations of Markov localization. 


sult is satisfactory. This is necessary because no matter how sophisticated 
the mathematical model, there will always remain unmodeled dependencies 
and sources of systematic noise that affect the overall result. 


Summary 


In this chapter, we discussed two families of probabilistic localization algo- 
rithms, grid techniques and Monte Carlo localization (MCL). 


* Grid techniques represent posteriors through histograms. 


* The coarseness of the grid trades off accuracy and computational effi- 
ciency. For coarse grids, it is usually necessary to adjust the sensor and 
motion models to account for effects that arise from the coarseness of the 
representation. For fine grids, it may be necessary to update grid cells 
selectively to reduce the overall computation. 
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* The Monte Carlo localization algorithm represents the posterior using 
particles. The accuracy-computational costs trade-off is achieved through 
the size of the particle set. 


* Both grid localization and MCL can globally localize robots. 


。 By adding random particles, MCL also solves the kidnapped robot prob- 
lem. 


* Mixture MCL is an extension that inverts the particle generation process 
for a fraction of all particles. This yields improved performance specifi- 
cally for robots with low-noise sensors, but at the expense of a more com- 
plex implementation. 


* KLD-sampling increases the efficiency of particle filters by adapting the 
size of sample sets over time. The advantage of this approach is maximal 
if the complexity of the beliefs varies drastically over time. 


* Unmodeled environment dynamics can be accommodated by filtering 
sensor data, rejecting those that with high likelihood correspond to an 
unmodeled object. When using range sensors, the robot tends to reject 
measurements that are surprisingly short. 


The popularity of MCL is probably due to two facts: MCL is just about the 
easiest localization algorithm to implement, and it is also one of the most 
potent ones, in that it can approximate nearly any distribution. 


Bibliographical Remarks 


Grid-based Monte Carlo localization was introduced by Simmons and Koenig, based on the re- 
lated method of maintaining certainty factors by Nourbakhsh et al. (1995). Since Simmons and 
Koenig's (1995) seminal paper, a number of techniques emerged that maintained histograms 
for localization (Kaelbling et al. 1996). While the initial work used relatively coarse grids to 
accommodate the enormous computational overhead of updating grids, Burgard et al. (1996) 
introduced selective update techniques that could cope with grids of much higher resolution. 
This development was often seen as a shift from coarse, topological Markov localization to de- 
tailed, metric localization. Overview articles about this body of work can be found in Koenig 
and Simmons (1998); Fox et al. (1999c). 

For a number of years, grid-based techniques were considered state of the art in mobile 
robot localization. Different successful applications of grid-based Markov localization can be 
found. For example, Hertzberg and Kirchner (1996) applied this technique to robots operating 
in sewage pipes, Simmons et al. (2000b) used this approach to localize a robot in an office en- 
vironment, and Burgard et al. (1999a) applied the algorithm to estimate the position of a robot 
operating in museums. Konolige and Chou (1999) introduced the idea of map matching into 
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Markov localization, by using fast convolution techniques for computing pose probabilities. An 
extension that combined both global localization and high-accuracy tracking was described in 
Burgard et al. (1998), who coined their technique dynamic Markov localization. A machine learn- 
ing technique for learning to recognize places was introduced by Oore et al. (1997). Thrun 
(1998a) extended the approach by a learning component for identifying suitable landmarks in 
the environment, based on related work by Greiner and Isukapalli (1994). The mathematical 
framework was extended by Mahadevan and Khaleeli (1999) to a framework known as semi 
Markov decision process, which made it possible to reason about the exact time when a transition 
from one cell to another occurred. An experimental comparison between grid-approaches and 
Kalman filtering techniques was carried out by Gutmann et al. (1998). Active localization was 
introduced for the grid-based paradigm in Burgard et al. (1997), and since extended to multi- 
hypothesis tracking by Austin and Jensfelt (2000); Jensfelt and Christensen (2001a). Fox et al. 
(2000) and Howard et al. (2003) extended this approach to the multi-robot localization problem. 
Moving away from the grid-based paradigm, Jensfelt and Christensen (2001a); Roumeliotis and 
Bekey (2000); Reuter (2000) showed that multi-hypothesis EKFs were equally suited for global 
localization problem. 

Motivated by the famous condensation algorithm in computer vision (Isard and Blake 1998), 
Dellaert et al. (1999); Fox et al. (1999a) were the first to develop particle filters for mobile robot 
localization. They also coined the term Monte Carlo Localization which has become the common 
name of this technique in robotics. The idea of adding random samples can be found in Fox 
et al. (1999a). An improved technique to deal with the kidnapped robot problem was Lenser 
and Veloso’s (2000) sensor resetting technique, in which a number of particles were jump-started 
using only the most recent measurement. Fox built on this technique and introduced the Aug- 
mented MCL algorithm for determining the number of particles to be added (Gutmann and Fox 
2002). Mixture MCL algorithm is due to Thrun et al. (2000c); see also van der Merwe et al. (2001). 
It provided a mathematical basis for generating samples from measurements. KLD-sampling, 
the adaptive version of particle filters, was introduced by Fox (2003). Jensfelt et al. (2000) and 
Jensfelt and Christensen (2001b) applied MCL to feature-based maps, and Kwok et al. (2004) 
introduced a real-time version of MCL that adapts the number of particles. Finally, a number 
of papers have applied MCL to robots with cameras (Lenser and Veloso 2000; Schulz and Fox 
2004; Wolf et al. 2005), including omnidirectional cameras (Króse et al. 2002; Vlassis et al. 2002). 

Particle filters have also been used for a number of related tracking and localization problems. 
Montemerlo et al. (2002b) studied the problem of simultaneous localization and people tracking, 
using a nested particle filter. A particle filter for tracking variable number of people is described 
in Schulz et al. (2001b), who demonstrated how moving people can be tracked reliably with a 
moving robot in an unknown environment (Schulz et al. 2001a). 


Exercises 


1. Consider a robot with d state variables. For example, the kinematic state 
of a free-flying rigid robot is usually d — 6; when velocities are included 
in the state vector, the dimension increases to d — 12. How does the 
complexity (update time and memory) of the following localization al- 
gorithms increase with d: EKF localization, grid localization, and Monte 
Carlo localization. Use the O( ) notation, and argue why your answer is 
correct. 
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2. Provide a mathematical derivation of the additive form of the multi- 
feature information integration in lines 14 and 15 in Table 7.2. 


3. Prove the correctness of Equation (8.4), page 257, in the limit T oo. 


4. As noted in the text, Monte Carlo localization is biased for any finite sam- 
ple size—i.e., the expected value of the location computed by the algo- 
rithm differs from the true expected value. In this question, you are asked 
to quantify this bias. 


To simplify, consider a world with four possible robot locations: X — 
(21, 22, 23, £4}. Initially, we draw N > 1 samples uniformly from among 
those locations. As usual, it is perfectly acceptable if more than one sam- 
ple is generated for any of the locations X. Let Z be a Boolean sensor 
variable characterized by the following conditional probabilities: 


p(z|ai) = 08 p(^z | #1) 0.2 
p(z|z2) = 04 p(z | 22) = 0.6 
p(z|zs) = 01 p(z |x) = 0.9 
p(z|z4) = 0.1 pz | 24) = 0.9 


MCL uses these probabilities to generate particle weights, which are sub- 
sequently normalized and used in the resampling process. For simplicity, 
let us assume we only generate one new sample in the resampling pro- 
cess, regardless of N. This sample might correspond to any of the four 
locations in X. Thus, the sampling process defines a probability distribu- 
tion over X. 


(a) What is the resulting probability distribution over X for this new 
sample? Answer this question separately for N — 1,...,10, and for 
N = oo. 

(b) The difference between two probability distributions p and q can be 
measured by the KL divergence, which is defined as 





KL(p,q) = > p(z;) log 
What are the KL divergences between the distributions in (a) and the 
true posterior? 


(c) What modification of the problem formulation (not the algorithm!) 
would guarantee that the specific estimator above is unbiased even for 
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finite values of N? Provide at least two such modifications (each of 
which should be sufficient). 


5. Consider a robot equipped with a range/bearing sensor of the type dis- 
cussed in Chapter 6.6. In this question, you are asked to devise an efficient 
sampling procedure that can incorporate k simultaneous measurements 
of identifiable landmarks. To illustrate that your routine works, you might 
generate plots of different landmark configurations, using k = 1,...,5 
adjacent landmarks. Argue what makes your routine efficient. 


6. Exercise 3 on page 235 described a simplistic underwater robot that can 
listen to acoustic beacons for localization. Here you are asked to imple- 
ment a grid localization algorithm for this robot. Analyze the accuracy 
and the failure modes in the context of the three localization problems: 
global localization, position tracking, and the kidnapped robot problem. 
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Introduction 


The previous two chapters discussed the application of probabilistic tech- 
niques to a low-dimensional perceptual problem, that of estimating a robot's 
pose. We assumed that the robot was given a map in advance. This assump- 
tion is legitimate in quite a few real-world applications, as maps are often 
available a priori or can be constructed by hand. Some application domains, 
however, do not provide the luxury of coming with an a priori map. Surpris- 
ingly enough, most buildings do not comply with the blueprints generated 
by their architects. And even if blueprints were accurate, they would not 
contain furniture and other items that, from a robot’s perspective, determine 
the shape of the environment just as much as walls and doors. Being able to 
learn a map from scratch can greatly reduce the efforts involved in installing 
a mobile robot, and enable robots to adapt to changes without human super- 
vision. In fact, mapping is one of the core competencies of truly autonomous 
robots. 

Acquiring maps with mobile robots is a challenging problem for a number 
of reasons: 


* The hypothesis space, which is the space of all possible maps, is huge. 
Since maps are defined over a continuous space, the space of all maps has 
infinitely many dimensions. Even under discrete approximations, such 
as the grid approximation that shall be used in this chapter, maps can 
easily be described 10° or more variables. The sheer size of this high- 
dimensional space makes it challenging to calculate full posteriors over 
maps; hence, the Bayes filtering approach that worked well for localiza- 
tion is inapplicable to the problem of learning maps, at least in its naive 
form discussed thus far. 
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Learning maps is a "chicken-and-egg" problem, for which reason it is 
often referred to as the simultaneous localization and mapping (SLAM) or 
concurrent mapping and localization problem. First, there is a localization 
problem. When the robot moves through its environment, it accumu- 
lates errors in odometry, making it gradually less certain as to where it 
is. Methods exist for determining the robot's pose when a map is avail- 
able, as we have seen in the previous chapter. Second, there is a mapping 
problem. Constructing a map when the robot's poses are known is also 
relatively easy—a claim that will be substantiated in this chapter and sub- 
sequent chapters. In the absence of both an initial map and exact pose 
information, however, the robot has to do both: estimating the map and 
localizing itself relative to this map. 


Of course, not all mapping problems are equally hard. The hardness of the 
mapping problem is the result of a collection of factors, the most important 
of which are: 


Size. The larger the environment relative to the robot's perceptual range, 
the more difficult it is to acquire a map. 


Noise in perception and actuation. If robot sensors and actuators were 
noise-free, mapping would be a simple problem. The larger the noise, the 
more difficult the problem. 


Perceptual ambiguity. The more frequently different places look alike, 
the more difficult it is to establish correspondence between different loca- 
tions traversed at different points in time. 


Cycles. Cycles in the environment are particularly difficult to map. If a 
robot just goes up and down a corridor, it can correct odometry errors 
incrementally when coming back. Cycles make robots return via different 
paths, and when closing a cycle the accumulated odometric error can be 
huge! 


To fully appreciate the difficulty of the mapping problem, consider Fig- 
ure 9.1. Shown there is a data set, collected in a large indoor environment. 
Figure 9.1a was generated using the robot's raw odometry information. Each 
black dot in this figure corresponds to an obstacle detected by the robot's 
laser range finder. Figure 9.1b shows the result of applying mapping algo- 
rithms to this data, one of which was the techniques described in this chapter. 
This example gives a good flavor of the problem at stake. 
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(a) 


(b) 





Figure 9.1 (a) Raw range data, position indexed by odometry. (b) Occupancy grid 
map. 


In this chapter, we first study the mapping problem under the restrictive 
assumption that the robot poses are known. Put differently, we side-step 
the hardness of the SLAM problem by assuming some oracle informs us 
of the exact robot path during mapping. This problem, whose graphical 
model is depicted in Figure 9.2, is also known as mapping with known poses. 
We will discuss a popular family of algorithms, collectively called occupancy 
grid mapping. Occupancy grid mapping addresses the problem of generat- 
ing consistent maps from noisy and uncertain measurement data, under the 
assumption that the robot pose is known. The basic idea of the occupancy 
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9 Occupancy Grid Mapping 
Figure 9.2 Graphical model of mapping with known poses. The shaded variables 


(poses x and measurements z) are known. The goal of mapping is to recover the map 
m. 


grids is to represent the map as a field of random variables, arranged in an 
evenly spaced grid. Each random variable is binary and corresponds to the 
occupancy of the location it covers. Occupancy grid mapping algorithms 
implement approximate posterior estimation for those random variables. 

The reader may wonder about the significance of a mapping technique 
that requires exact pose information. After all, no robot's odometry is perfect! 
The main utility of the occupancy grid technique is in post-processing: Many 
of the SLAM techniques discussed in subsequent chapters do not generate 
maps fit for path planning and navigation. Occupancy grid maps are often 
used after solving the SLAM problem by some other means, and taking the 
resulting path estimates for granted. 


The Occupancy Grid Mapping Algorithm 


The gold standard of any occupancy grid mapping algorithm is to calculate 
the posterior over maps given the data 


p(m | Zl: Lt) 


As usual, m is the map, 21., the set of all measurements up to time t, and 
11: is the path of the robot defined through the sequence of all poses. The 
controls u4:; play no role in occupancy grid maps, since the path is already 
known. Hence, they will be omitted throughout this chapter. 

The types of maps considered by occupancy grid maps are fine-grained 
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grids defined over the continuous space of locations. By far the most com- 
mon domain of occupancy grid maps are 2-D floor plan maps, which de- 
scribe a 2-D slice of the 3-D world. 2-D maps are usually the representation of 
choice when a robot navigates on a flat surface, and the sensors are mounted 
so that they capture only a slice of the world. Occupancy grid techniques 
generalize to 3-D representations but at significant computational expenses. 

Let m; denote the grid cell with index i. An occupancy grid map partitions 
the space into finitely many grid cells: 


Each mi has attached to it a binary occupancy value, which specifies whether 
a cell is occupied or free. We will write “1” for occupied and "0" for free. 
The notation p(m; = 1) or p(mj) refers to the probability that a grid cell is 
occupied. 

The problem with the posterior in Equation (9.1) is its dimensionality: the 
number of grid cells in maps like the one shown in Figure 9.1 are in the tens 
of thousands. For a map with 10,000 grid cells, the number of maps that can 
be represented by this map is 210000. Calculating a posterior probability for 
each single map is therefore intractable. 

The standard occupancy grid approach breaks down the problem of esti- 
mating the map into a collection of separate problems, namely that of esti- 
mating 


p(m; | Zl: L1:t) 


for all grid cell m;. Each of these estimation problems is now a binary prob- 
lem with static state. This decomposition is convenient but not without prob- 
lems. In particular, it does not enable us to represent dependencies among 
neighboring cells; instead, the posterior over maps is approximated as the 
product of its marginals: 


p(m | Zit i:t) = Il p(m; | Mie) 
i 


We will return to this issue in Chapter 9.4 below, when we discuss more 
advanced mapping algorithms. For now, we will adopt this factorization for 
convenience. 

Thanks to our factorization, the estimation of the occupancy probability 
for each grid cell is now a binary estimation problem with static state. A 
filter for this problem was already discussed in Chapter 4.2: the binary Bayes 
filter. The corresponding algorithm was depicted in Table 4.2 on page 94. 
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A 


Algorithm occupancy. grid mapping((li 1i), £t, 21): 


for all cells m; do 
if m; in perceptual field of z, then 
lii = l1; + inverse sensor model(m;, £+, z+) — lo 
else 
leg = ha 
endif 
endfor 
return {lri} 








Table 9.1 The occupancy grid algorithm, a version of the binary Bayes filter in Ta- 
ble 4.2. 


The algorithm in Table 9.1 applies this filter to the occupancy grid mapping 
problem. As in the original filter, our occupancy grid mapping algorithm 
uses the log odds representation of occupancy: 


p(m; | End) 
1 — p(mi | Pip, Dre) 





liu; = log 


This representation is already familiar from Chapter 4.2. The advantage of 
the log odds over the probability representation is that we can avoid numeri- 
cal instabilities for probabilities near zero or one. The probabilities are easily 
recovered from the log odds ratio: 


1 


In; | Z1:t; : LU a 
p(mi | 21:4, 214) 1+ exp{lt i} 


The algorithm occupancy_grid_mapping in Table 9.1 loops through all 
grid cells i, and updates those that fall into the sensor cone of the measure- 
ment z. For those where it does, it updates the occupancy value by virtue 
of the function inverse sensor model in line 4 of the algorithm. Otherwise, 
the occupancy value remains unchanged, as indicated in line 6. The constant 
lo is the prior of occupancy represented as a log odds ratio: 
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(a) (b) 


























Figure 9.3 Two examples of an inverse measurement model in- 
verse_range_sensor_model for two different measurement ranges. The darkness of 
each grid cell corresponds to the likelihood of occupancy. This model is somewhat 
simplistic; in contemporary implementations the occupancy probabilities are usually 
weaker at the border of the measurement cone. 


The function inverse_sensor_model implements the inverse measurement 
model p(m; | z+, x+) in its log odds form: 


p(mi; | Zt, T1) 
1 — p(m; | Zt, Tt) 





inverse sensor model(m;,z;,7;) = log 


A somewhat simplistic example of such a function for range finders is given 
in Table 9.2 and illustrated in Figure 9.3a&b. This model assigns to all cells 
within the sensor cone whose range is close to the measured range an occu- 
pancy value of locc. In Table 9.2, the width of this region is controlled by the 
parameter a, and the opening angle of the beam is given by 8. We note that 
this model is somewhat simplistic; in contemporary implementations the oc- 
cupancy probabilities are usually weaker at the border of the measurement 
cone. 

The algorithm inverse sensor model calculates the inverse model by first 
determining the beam index k and the range r for the center-of-mass of the 
cell m;. This calculation is carried out in lines 2 through 5 in Table 9.2. As 
usual, we assume that the robot pose is given by x; = (x y 0)". In line 7, it 
returns the prior for occupancy in log odds form whenever the cell is outside 
the measurement range of this sensor beam, or if it lies more than a/2 behind 
the detected range zk. In line 9, it returns loce > lo if the range of the cell is 
within ta/2 of the detected range zk. It returns Ifee < lo if the range to the 
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1: Algorithm inverse_range_sensor_model(m,, x+, z+): 
2: Let x;, y; be the center-of-mass of m; 

3: r= (zi - z)? + (yi - y? 

4: $ = atan2(y; — y, vi — 1) — 0 

5: k = argmin; | — 9j,sens| 

6: ifr > min(zmax, 2? + 0/2) or |ó — 0x sens| > 8/2 then 
7: return lo 

8: if zF < Zmax and |r — zF| < a/2 

9: return locc 

10: ifr < ae 

11: return /free 

12: endif 











Table 9.2 A simple inverse measurement model for robots equipped with range 
finders. Here a is the thickness of obstacles, and 3 the width of a sensor beam. The 
values locc and free in lines 9 and 11 denote the amount of evidence a reading carries 
for the two different cases. 


cell is shorter than the measured range by more than a/2. The left and center 
panel of Figure 9.3 illustrates this calculation for the main cone of a sonar 
beam. 

A typical application of an inverse sensor model for ultrasound sensors 
is shown in Figure 9.4. Starting with an initial map the robot successively 
extends the map by incorporating local maps generated using the inverse 
model. A larger occupancy grid map obtained with this model for the same 
environment is depicted in Figure 9.5. 

Figures 9.6 shows an example map next to a blueprint of a large open 
exhibit hall, and relates it to the occupancy map acquired by a robot. The 
map was generated using laser range data gathered in a few minutes. The 
gray-level in the occupancy map indicates the posterior of occupancy over an 
evenly spaced grid: The darker a grid cell, the more likely it is to be occupied. 
While occupancy maps are inherently probabilistic, they tend to quickly con- 
verge to estimates that are close to the two extreme posteriors, 1 and 0. In 
comparison between the learned map and the blueprint, the occupancy grid 
map shows all major structural elements, and obstacles as they were visi- 


»js.on 000000 


9.2 The Occupancy Grid Mapping Algorithm 289 
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Figure 9.4 Incremental learning of an occupancy grid map using ultra-sound data 
in a corridor environment. The upper left image shows the initial map and the lower 
right image contains the resulting map. The maps in columns 2 to 4 are the local maps 
built from an inverse sensor model. Measurements beyond a 2.5m radius have not 


been considered. Each cone has an opening angle of 15 degrees. Images courtesy of 
Cyrill Stachniss, University of Freiburg. 











Figure 9.5 Occupancy probability map of an office environment built from sonar 
measurements. Courtesy of Cyrill Stachniss, University of Freiburg. 


5ris.cn 000000 


290 9 Occupancy Grid Mapping 





Figure 9.6 (a) Occupancy grid map and (b) architectural blue-print of a large open 
exhibit space. Notice that the blue-print is inaccurate in certain places. 
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(a) 


(b) 





Figure 9.7 (a) Raw laser range data with corrected pose information. Each dot cor- 
responds to a detection of an obstacle. Most obstacles are static (walls etc.), but some 
were dynamic, since people walked near the robot during data acquisition. (b) Occu- 
pancy grid map built from the data. The gray-scale indicates the posterior probability: 
Black corresponds to occupied with high certainty, and white to free with high cer- 
tainty. The gray background color represents the prior. Figure (a) courtesy of Steffen 
Gutmann. 
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Figure 9.8 Estimation of occupancy maps using stereo vision: (a) camera image, (b) 
sparse disparity map, (c) occupancy map by projecting the disparity image onto the 
2-D plane and convolving the result with a Gaussian. Images courtesy of Thorsten 
Fróhlinghaus. 


ble at the height of the laser. Through a careful inspection, the reader may 
uncover some small discrepancies between the blueprint and the actual en- 
vironment configuration. 

Figure 9.7 compares a raw dataset with the occupancy grid maps gener- 
ated from this data. The data in Panel (a) was preprocessed by a SLAM 
algorithm, so that the poses align. Some of the data is corrupted by the pres- 
ence of people; the occupancy grid map filters out people quite nicely. This 
makes occupancy grid maps much better suited for robot navigation than 
sets of scan endpoint data: A planner fed the raw sensor endpoints would 
have a hard time finding a path through such scattered obstacles, even if 
the evidence that the corresponding cell is free outweighed that of it being 
occupied. 

We note that our algorithm makes occupancy decisions exclusively based 
on sensor measurements. An alternative source of information is the space 
claimed by the robot itself: When the robot's pose is x, the region surround- 
ing z, must be navigable. Our inverse measurement algorithm in Table 9.2 
can easily be modified to incorporate this information, by returning a large 
negative number for all grid cells occupied by a robot when at z;. In practice, 
it is a good idea to incorporate the robot's volume when generating maps, 
especially if the environment is populated during mapping. 
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Multi-Sensor Fusion 


Robots are often equipped with more than one type of sensor. Hence, a nat- 
ural objective is to integrate information from more than one sensor into a 
single map. This question as to how to best integrate data from multiple 
sensors is particularly interesting if the sensors have different characteristics. 
For example, Figure 9.8 shows occupancy maps built with a stereo vision sys- 
tem, in which disparities are projected onto the plane and convolved with a 
Gaussian. Clearly, the characteristics of stereo are different from that of a 
sonar-based range finder. They are sensitive to different types of obstacles. 

Unfortunately, fusing data from multiple sensors with Bayes filters is 
not an easy endeavor. A naive solution is to execute algorithm occu- 
pancy_grid_mapping in Table 9.1 with different sensor modalities. How- 
ever, such an approach has a clear drawback. If different sensors detect dif- 
ferent types of obstacles, the result of Bayes filtering is ill-defined. Consider, 
for example, an obstacle that can be recognized by one sensor type but not by 
another. Then these two sensor types will generate conflicting information, 
and the resulting map will depend on the amount of evidence brought by ev- 
ery sensor system. This is generally undesirable, since whether or not a cell 
is considered occupied depends on the relative frequency at which different 
sensors are polled. 

A popular approach to integrating information from multiple sensors is 
to build separate maps for each sensor type, and integrate them using an 
appropriate combination function. Let m* = {m*} denote the map built by 
the k-th sensor type. If the measurements of the sensors are independent of 
each other we can directly combine them using De Morgan's law 


p(m;) = 1 - lI (1 — p(m/)) 


k 


Alternatively, one can compute the maximum 
p(m;) = max p(mj) 


of all maps, which yields the most pessimistic estimates of its components. If 
any of the sensor-specific maps show that a grid cell is occupied, so will the 
combined map. 
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Learning Inverse Measurement Models 


Inverting the Measurement Model 


The occupancy grid mapping algorithm requires a marginalized inverse mea- 
surement model, p(m; | x, z). This probability is called "inverse" since it rea- 
sons from effects to causes: it provides information about the world condi- 
tioned on a measurement caused by this world. It is marginalized for the i-th 
grid cell; a full inverse would be of the type p(m | z, z). In our exposition of 
the basic algorithm, we already provided an ad hoc procedure in Table 9.2 for 
implementing such an inverse model. This raises the question as to whether 
we can obtain an inverse model in a more principled manner, starting at the 
conventional measurement model. 

The answer is positive but less straightforward than one might assume at 
first glance. Bayes rule suggests 


p(z | z, m) p(m | x) 
p(z | x) 
= 1 p(z|x,m) p(m) 





p(m|a,z) = 


Here we silently assume p(m | x) = p(m), hence the pose of the robot tells 
us nothing about the map—an assumption that we will adopt for sheer con- 
venience. If our goal was to calculate the inverse model for the entire map 
at-a-time, we would now be done. However, our occupancy grid mapping 
algorithm approximates the posterior over maps by its marginals, one for 
each grid cell m;. The inverse model for the i-th grid cell is obtained by 
selecting the marginal for the i-th grid cell: 


p(mi|z,z) = m X p(z|a,m) p(m) 


m:m(i)-mi; 


This expression sums over all maps m for which the occupancy value of grid 
cell ¿ equals m;. Clearly, this sum cannot be computed, since the space of all 
maps is too large. 

We will now describe an algorithm for approximating this expression. The 
algorithm involves generating samples from the measurement model, and 
approximating the inverse using a supervised learning algorithm, such as logis- 
tic regression or a neural network. 
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Sampling from the Forward Model 


The basic idea is simple and quite universal: If we can generate random 
triplets of poses gl, measurements zll, and map occupancy values m”! for 
any grid cell m;, we can learn a function that accepts a pose x and measure- 
ment z as an input, and outputs the probability of occupancy for m;. 

A sample of the form Cu ZH m 


procedure. 


) can be generated by the following 


1. Sample a random map m!*! ~ p(m). For example, one might already have 
a database of maps that represents p(m) and randomly draws a map from 
the database. 


2. Sample a pose al inside the map. One may safely assume that poses are 


uniformly distributed. 


3. Sample a measurement ZH ~ plz | ltl ml*l). This sampling step is rem- 


iniscent of a robot simulator that stochastically simulates a sensor mea- 
surement. 


4. Extract the desired “true” occupancy value mi for the target grid cell from 
the map m. 


The result is a sampled pose al a measurement i, and the occupancy 


value of the grid cell m;. Repeated application of this sampling step yields a 
data set 


ae ze — occ(m,)Ul 
Tz; 2% — oc(m;)” 


Tj. zy —  occ(m,)Pl 


These triplets may serve as training examples for the supervised learning algo- 
rithm, which approximates the desired conditional probability p(mi | z, x). 
Here the measurements z and the pose x are input variables, and the occu- 
pancy value occ(mj) is a target for the output of the learning algorithm. 
This approach is somewhat inefficient, since it fails to exploit a number of 
properties that we know to be the case for the inverse sensor model. 


* Measurements should carry no information about grid cells far outside 
their perceptual range. This observation has two implications: First, we 
can focus our sample generation process on triplets where the cell m; is 
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indeed inside the measurement cone. And second, when making a pre- 
diction for this cell, we only have to include a subset of the data in a mea- 
surement z (e.g., nearby beams) as input to the learning algorithm. 


* The characteristics of a sensor are invariant with respect to the absolute 
coordinates of the robot or the grid cell when taking a measurement. 
Only the relative coordinates matter. If we denote the robot pose by 
xı = (x y 0)! and the coordinates of the grid cell by m; = (zm, ym,)7 
the coordinates of the grid cell are mapped into the robot's local reference 
frame via the following translation and rotation: 


cosÜ —sind Lin, — t 
sin cos0 Ym; — Y 
In robots with circular arrays of range finders, it makes sense to encode 


the relative location of a grid cell using the familiar polar coordinates 
(range and bearing). 


, 


* Nearby grid cells should have a similar interpretation under the inverse 
sensor model. This smoothness suggests that it may be beneficial to learn 
a single function in which the coordinates of the grid cell function as an 
input, rather than learning a separate function for each grid cell. 


* |f the robot possesses functionally identical sensors, the inverse sen- 
sor model should be interchangeable for different sensors. For robots 
equipped with a circular array of range sensors, any of the resulting sen- 
sor beams are characterized by the same inverse sensor model. 


The most basic way to enforce these invariances is to constrain the learn- 
ing algorithm by choosing appropriate input variables. A good choice is to 
use relative pose information, so that the learning algorithm cannot base its 
decision on absolute coordinates. It is also a good idea to omit sensor mea- 
surements known to be irrelevant to occupancy predictions, and to confine 
the prediction to grid cells inside the perceptual field of a sensor. By exploit- 
ing these invariances, the training set size can be reduced significantly. 


The Error Function 


To train the learning algorithm, we need an approximate error function. A 
popular example are artificial neural networks trained with the Backpropa- 
gation algorithm. Backpropagation trains neural networks by gradient descent 
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in parameter space. Given an error function that measures the “mismatch” 
between the network’s actual and desired output, Backpropagation calcu- 
lates the first derivative of the target function and the parameters of the 
neural network, and then adapts the parameters in opposite direction of the 
gradient so as to diminish the mismatch. This raises the question as to what 
error function to use. 

A common approach is to train the learning algorithm so as to maximize 
the log-likelihood of the training data. More specifically we are given a train- 
ing set of the form 


input! —5 — occ(m;)!! 
input?) —5à5 — occ(m,)P] 
input”!  —5  oce(m,)!! 


occ(m;)'* is the k-th sample of the desired conditional probability, and 
input!*l is the corresponding input to the learning algorithm. Clearly, the 
exact form of the input may vary as a result of the encoding known invari- 
ances, but the exact nature of this vector will play no role in the form of the 
error function. 

Let us denote the parameters of the learning algorithm by W. Assuming 
that each individual item in the training data has been generated indepen- 
dently, the likelihood of the training data is now 


Tom | input, w) 
and its negative logarithm is 


J(W) = -X log pm"! | input! W) 


Here J defines the function we seek to minimize during training. 

Let us denote the learning algorithm by f (input ^l , W). The output of this 
function is a value in the interval [0; 1]. After training, we want the learning 
algorithm to output the probability of occupancy: 


f(input!*!, W) if m 


[k] | ; [k] B 
m! th W 
Pan apu ) kun T if m; 0 


Thus, we seek an error function that adjusts W so as to minimize the devia- 
tion of this predicted probability and the one communicated by the training 
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example. To find such an error function, we re-write (9.16) as follows: 
[k] |. [k] = , [k] ml" gg [k] 1 一 mp] 

p(m;"|input",W) = f(input,w)™ (1 — fünput'*,W)) = 

It is easy to see that this product and Expression (9.16) are identical. In the 

product, one of the terms is always 1, since its exponent is zero. Substituting 

the product into (9.15) and multiplying the result by minus one gives us the 

following function: 


gO) E -È o|; (inputi, wr)? (1 — (input, w))t7m? 


- Y m log f(input™!, W) + (1 — m!"!) log(1 — f (input, W)) 


J(W) is the error function to minimize when training the learning algorithm. 
It is easily folded into any algorithm that uses gradient descent to adjusts its 
parameters. 


Examples and Further Considerations 


Figure 9.9 shows the result of an artificial neural network trained to mimic 
the inverse sensor model. The robot in this example is equipped with a cir- 
cular array of sonar range sensors mounted at approximate table height. The 
input to the network are the relative range and bearing of a target cell, along 
with the set of five adjacent range measurements. The output is a probability 
of occupancy: the darker a cell, the more likely it is occupied. As this ex- 
ample illustrates, the approach correctly learns to distinguish freespace from 
occupied space. The uniformly gray area behind obstacles matches the prior 
probability of occupancy, which leads to no change when used in the occu- 
pancy grid mapping algorithm. Figure 9.9b contains a faulty short reading 
on the bottom left. Here a single reading seems to be insufficient to predict 
an obstacle with high probability. 

We note that there exists a number of ways to train a function approxi- 
mator using actual data collected by a robot, instead of the simulated data 
from the forward model. In general, this is the most accurate data one can 
use for learning, since the measurement model is necessarily just an approx- 
imation. One such way involves a robot operating in a known environment 
with a known map. With Markov localization, we can localize the robot, and 
then use its actual recorded measurements and the known map occupancy 
to assemble training examples. It is even possible to start with an approxi- 
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Figure 9.9 Inverse sensor model learned from data: Three sample sonar scans (top 
row) and local occupancy maps (bottom row), as generated by the neural network. 
Bright regions indicate free-space, and dark regions indicate walls and obstacles (en- 
larged by a robot diameter). 


mate map, use the learned sensor model to generate a better map, and use 
the procedure just outlined to improve the inverse measurement model. 


Maximum A Posteriori Occupancy Mapping 


The Case for Maintaining Dependencies 


In the remainder of this chapter, we will return to one of the very basic as- 
sumptions of the occupancy grid mapping algorithm. In Chapter 9.2, we 
assumed that we can safely decompose the map inference problem defined 
over high-dimensional space of all maps, into a collection of single-cell map- 
ping problems. This assumption culminated into the factorization in (9.4): 


p(m | Zit i:t) II p(m; | Picts Bit) 
i 


This raises the question as to how faithful we should be in the result of any 
algorithm that relies on such a strong decomposition. 
Figure 9.10 illustrates a problem that arises directly as a result of this factor- 
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Figure 9.10 The problem with the standard occupancy grid mapping algorithm in 
Chapter 9.2: For the environment shown in Figure (a), a passing robot might receive 
the (noise-free) measurement shown in (b). The factorial approach maps these beams 
into probabilistic maps separately for each grid cell and each beam, as shown in (c) 
and (d). Combining both interpretations yields the map shown in (e). Obviously, 
there is a conflict in the overlap region, indicated by the circles in (e). The interesting 
insight is: There exist maps, such as the one in diagram (f), that perfectly explain the 
sensor measurement without any such conflict. For a sensor reading to be explained, 
it suffices to assume an obstacle somewhere in the cone of a measurement, and not 
everywhere. 


ization. Shown there is a situation in which the robot facing a wall receives 
two noise-free sonar range measurements. Because the factored approach 
predicts an object along the entire arc at the measured range, the occupancy 
values of all grid cells along this arc are increased. When combining the 
two different measurements shown in Figure 9.10c&d, a conflict is created, 
as shown in Figure 9.10e. The standard occupancy grid mapping algorithm 
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T Algorithm MAP_occupancy_grid_mapping(71.2, 21:4): 
2: set m = {0} 
3: repeat until convergence 
4: for all cells m; do 
5: m; = argmax k lo + 5 log 
k=0,1 7 
measurement_model(z, x+, m with m; = k) 
6: endfor 
7: endrepeat 
8: return m 











Table9.3 The maximum a posteriori occupancy grid algorithm, which uses conven- 
tional measurement models instead of inverse models. 


"resolves" this conflict by summing up positive and negative evidence for 
occupancy; however, the result will reflect the relative frequencies of the two 
types of measurements, which is undesirable. 

However, there exist maps, such as the one in Figure 9.10f, that perfectly 
explain the sensor measurements without any such conflict. This is because 
for a sensor reading to be explained, it suffices to assume an obstacle some- 
where in its measurement cone. Put differently, the fact that cones sweep 
over multiple grid cells induces important dependencies between neighbor- 
ing grid cells. When decomposing the mapping into thousands of individual 
grid cell estimation problems, we lose the ability to consider these dependen- 
cies. 


Occupancy Grid Mapping with Forward Models 


These dependencies are incorporated by an algorithm that outputs the mode 
of the posterior, instead of the full posterior. The mode is defined as the 
maximum of the logarithm of the map posterior: 


m* = argmax logp(m | 21:4, 14) 
m 
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The map posterior factors into a map prior and a measurement likelihood 
(c.f., Equation (9.11)): 


log p(m | 21:2,%12) = const. + logp(zia | 212,m) + log p(m) 


The log-likelihood log p(zi:; | 21:4, m) decomposes into a sum of individual 
measurement log-likelihoods: 


logp(2 | tim) = y log p(x | ve, m) 


Further, the log-prior also decomposes. To see, we note that the prior proba- 
bility of any map m is given by the following product: 


pm) = [J p(m)™ 0 -»m)'^ 


l 
3 
S 
= 

B 
T 
| 
m. 
& 
~ 

8 


Here p(m) is the prior probability of occupancy (e.g., p(m) — 0.5), and N is 
the number of grid cells in the map. The expression (1 — p(m))" is simply a 
constant, which is replaced by our generic symbol 7 as usual. 

This implies for the log version of the prior: 


logp(m) = const. + 5 m; logp(m) 一 mi log(1 — p(m)) 


= const. 十 5 m; log um 


= const. 十 5 m; lo 
i 


The constant lọ is adopted from (9.7). The term M log(1 — p(m;)) is obvi- 
ously independent of the map. Hence it suffices to optimize the remaining 
expression and the data log-likelihood: 


m* = argmax V log p(zi | x, m) + lo X m; 
e t i 


A hill-climbing algorithm for maximizing this log-probability is provided in 
Table 9.3. This algorithm starts with the all-free map (line 2). It “flips” the 
occupancy value of a grid cell when such a flip increases the likelihood of the 
data (lines 4-6). For this algorithm it is essential that the prior of occupancy 
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Figure 9.11 (a) Sonar range measurements from a noise-free simulation. (b) Results 
of the standard occupancy mapper, lacking the open door. (c) A maximum a posterior 
map. (d) The residual uncertainty in this map, obtained by measuring the sensitivity 
of the map likelihood function with respect to individual grid cells. This map clearly 
shows the door, and it also contains flatter walls at both ends. 


p(m;) is not too close to 1; otherwise it might return an all-occupied map. As 
any hill climbing algorithm, this approach is only guaranteed to find a local 
maximum. In practice, there are usually very few, if any, local maxima. 
Figure 9.11 illustrates the effect of the MAP occupancy grid algorithm. Fig- 
ure 9.11a depicts a noise-free data set of a robot passing by an open door. 
Some of the sonar measurements detect the open door, while others are re- 
flected at the door post. The standard occupancy mapping algorithm with 
inverse models fails to capture the opening, as shown in Figure 9.11b. The 
mode of the posterior is shown in Figure 9.11c. This map models the open 
door correctly, hence it is better suited for robot navigation than the standard 
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occupancy grid map algorithm. Figure 9.11d shows the residual uncertainty 
of this map. This diagram is the result of a cell-wise sensitivity analysis: The 
magnitude by which flipping a grid cell decreases the log-likelihood function 
is illustrated by the grayness of a cell. This diagram, similar in appearance 
to the regular occupancy grid map, suggests maximum uncertainty for grid 
cells behind obstacles. 

There exists a number of limitations of the algorithm MAP occ- 
upancy_grid_mapping, and it can be improved in multiple ways. The algo- 
rithm is a maximum a posteriori approach, and as such returns no notion of 
uncertainty in the residual map. Our sensitivity analysis approximates this 
uncertainty, but this approximation is overconfident, since sensitivity analy- 
sis only inspects the mode locally. Further, the algorithm is a batch algorithm 
and cannot be executed incrementally. In fact, the MAP algorithm requires 
that all data is kept in memory. At the computational end, the algorithm can 
be sped up by initializing it with the result of the regular occupancy grid 
mapping approach, instead of an empty map. Finally, we note that only a 
small number of measurements are affected by flipping a grid cell in line 5 
of Table 9.3. While each sum is potentially huge, only a small number of el- 
ements has to be inspected when calculating the argmax. This property can 
be exploited in the basic algorithm, to increase its computational efficiency. 


Summary 


This chapter introduced algorithms for learning occupancy grids. All algo- 
rithms in this chapter require exact pose estimates for the robot, hence they 
do not solve the general mapping problem. 


* The standard occupancy mapping algorithm estimates for each grid cell 
individually the posterior probability of occupancy. It is an adaptation of 
the binary Bayes filter for static environments. 


* Data from multiple sensors can be fused into a single map in two ways: 
By maintaining a single map using Bayes filters, and by maintaining mul- 
tiple maps, one for each sensor modality. The latter approach is usually 
preferable when different sensors are sensitive to different types of obsta- 
cles. 


* The standard occupancy grid mapping algorithm relies on inverse mea- 
surement models, which reason from effects (measurements) to causes 
(occupancy). This differs from previous applications of Bayes filters in the 
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context of localization, where the Bayes filter was based on a conventional 
measurement model that reasons from causes to effects. 


e It is possible to learn inverse sensor models from the conventional mea- 
surement model, which models the sensor from causes to effects. To do 
so, one has to generate samples and learn an inverse model using a super- 
vised learning algorithm. 


。 The standard occupancy grid mapping algorithm does not maintain de- 
pendencies in the estimate of occupancy. This is a result of decomposing 
the map posterior estimation problem into a large number of single-cell 
posterior estimation problem. 


* The full map posterior is generally not computable, due to the large num- 
ber of maps that can be defined over a grid. However, it can be max- 
imized. Maximizing it leads to maps that are more consistent with the 
data than the occupancy grid algorithm using Bayes filters. However, the 
maximization requires the availability of all data, and the resulting max- 
imum a posterior map does not capture the residual uncertainty in the 
map. 


Without a doubt, occupancy grid maps and their various extensions are 
vastly popular in robotics. This is because they are extremely easy to acquire, 
and they capture important elements for robot navigation. 


Bibliographical Remarks 


Occupancy grid maps are due to Elfes (1987), whose Ph.D. thesis (1989) defined the field. A well- 
referenced article by Moravec (1988) provides a highly accessible introduction into this topic, 
and lays out the basic probabilistic approach that forms the core of this chapter. In unpublished 
work, Moravec and Martin (1994) extended occupancy grid maps to 3-D, using stereo as the 
primary sensor. Multi-sensor fusion in occupancy grids were introduced in Thrun et al. (1998a). 
The results for learning inverse sensor models described in this chapter can be found in Thrun 
(1998b). The forward modeling approach, also described in this chapter, is based on a similar 
algorithm in Thrun (2003). 

Occupancy maps have been used for a number of different purposes. Borenstein and Koren 
(1991) were the first to adopt occupancy grid maps for collision avoidance. A number of au- 
thors have used occupancy grid maps for localization, by cross-matching two occupancy grid 
maps. Such “map matching” algorithms are discussed in detail in Chapter 7. Biswas et al. (2002) 
used occupancy grid maps to learn shape models of movable objects in dynamic environments. 
This approach was later extended to learning hierarchical class models of dynamic objects, all 
represented with occupancy grid maps (Anguelov et al. 2002). Occupancy grid maps have also 
extensively been used in the context of the simultaneous localization and mapping problem. 
Those applications will be discussed in later chapters. 
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Figure 9.12 Mobile indoor robot of the type RWI B21, with 24 sonar sensors 
mounted on a circular array around the robot. 


The idea of representing space by grids is only one out of many ideas explored in the mo- 
bile robotics literature. Classical work on motion planning often assumes that the environment 
is represented by polygons, but leaves open as to how those models are being acquired from 
data (Schwartz et al. 1987). An early proposal on learning polygonal maps is due to Chatila and 
Laumond (1985). A first implementation using Kalman filters for fitting lines from sonar data 
was done by Crowley (1989). In more recent work, Anguelov et al. (2004) devised techniques for 
identifying straight-line doors from raw sensor data, and learn visual attributes for improving 
the door detection rate. 

An early paradigm in spatial representation is the topological paradigm, in which space is 
represented by a set of local relations, often corresponding to specific actions a robot may have 
to take to navigate between adjacent locations. Examples of topological mapping algorithm 
include Kuipers and Levitt’s (1988) work on their Spatial Semantic hierarchy (see also Kuipers 
et al. (2004)); Matarié's (1990) M.Sc. thesis work, Kortenkamp and Weymouth’s (1994) work on 
topological graphs contracted from sonar and vision data, and Shatkay and Kaelbling’s (1997) 
approach on spatial HMMs with arc length information. Occupancy grid maps are members of 
the complimentary paradigm: metric representations. Metric representations directly describe 
the robot environment, in some absolute coordinate system. A second example of a metric 
approach is the EKF SLAM algorithm, which will be discussed in the subsequent chapter. 

There is a history of attempts to generate mapping algorithms that harvest the best of both 
paradigms, topological and metric. Tomatis et al. (2002) uses topological representations to close 
loops consistently, then converts to metric maps. Thrun (1998b) first builds a metric occupancy 
grid map, then extracts a topological skeleton to facilitate fast motion planning. In Chapter 11, 
we will study techniques that bridge both paradigms, metric and topological. 
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9.7 Exercises 


EXPONENTIAL DECAY 


1. Change the basic occupancy grid algorithm in Table 9.1 to include a pro- 


vision for the change of occupancy over time. To accommodate such 
change, evidence collected At time steps in the past should be decayed 
by a factor of o/^*, for some value of a < 1 (e.g., œ = 0.99). Such a rule is 
called exponential decay. State the exponential decay occupancy grid map- 
ping algorithm in log odds form and argue its correctness. If you cannot 
find an exact algorithm, state an approximation and argue why it is a 
suitable approximation. For simplicity, you might want to assume a prior 
p(m,) = 0.5 for occupancy. 


. The binary Bayes filter assumes that a cell is either occupied or unoccu- 


pied, and the sensor provides noisy evidence for the correct hypothesis. 
In this question, you will be asked to build an alternative estimator for a 
grid cell: Suppose the sensor can only measure "0 = unoccupied” or “1 = 
occupied", and it receives a sequence 


0, 0, 1, 0, 1, 1, 1, 0, 1, 0. 


What is the maximum likelihood probability p for the next reading to be 
1? Provide an incremental formula for a general maximum likelihood 
estimator for this probability p. Discuss the difference of this estimator to 
the binary Bayes filter (all for a single cell only). 


. We study a common sensor configuration in indoor robotics. Suppose an 


indoor robot uses sonar sensors with a 15 degree opening cone, mounted 
at a fixed height so that they point out horizontally and parallel to the 
ground. Figure 9.12 shows such a robot. Discuss, what happens when the 
robot faces an obstacle whose height is just below the height of the sensor 
(for example, 15 cm below). Specifically, answer the following questions. 


(a) Under what conditions will the robot detect the obstacle? Under what 
conditions will it fail to detect it? Be concise. 


(b) What implications does this all have for the binary Bayes filter and the 
underlying Markov assumption? How can you make the occupancy 
grid algorithm fail? 


(c) Based on your answer to the previous question, can you provide an im- 
proved occupancy grid mapping algorithm that will detect the obstacle 
more reliably than the plain occupancy grid mapping algorithm? 
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4. In this question, you are being asked to design a simple sensor model. 
Suppose you are given binary occupancy measurements for the following 








four cells: 
cell number type measurement sequence 
cell 1 occupied | 1 1 0 1 0 1 1 
cell 2 occupied |O 1 1 1 0. 0 1 
cell 3 free 0000000 
cell 4 free 1 0 01000 

















What is the maximum likelihood measurement model p(z | m;)? (Hint: 
m; is a binary occupancy variable, and z is a binary measurement vari- 


able.) 
5. For the table in Exercise 4, implement the basic occupancy grid algorithm. 


(a) What is the posterior p(m; | 21:7) for the four different cases, assuming 
a prior p(m;) = 0.5? 

(b) Devise a tuning algorithm for your sensor model that makes the out- 
put of your occupancy grid mapping algorithm as close as possible to 
the ground truth, for the four cases in Exercise 4. What do you find? 
(For this question, you will have to come up with a suitable closeness 
measure.) 


6. The standard occupancy grid mapping algorithm is implemented us- 
ing the log odds form, even though it would have equally been imple- 
mentable using probabilities. 


(a) Derive an update rule that represented occupancy probabilities di- 
rectly, without the detour of the log odds representation. 


(b) For an implementation in a common programming language such as 
C++, give an example in which the probability implementation yields 
different results form the log odds implementation, due to numerical 
truncation. Explain your example, and argue whether you judge this 
to be a problem in practice. 
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Simultaneous Localization and 
Mapping 


Introduction 


This and the following chapters address one of the most fundamental prob- 
lems in robotics, the simultaneous localization and mapping problem. This prob- 
lem is commonly abbreviated as SLAM, and is also known as Concurrent 
Mapping and Localization, or CML. SLAM problems arise when the robot does 
not have access to a map of the environment, nor does it know its own pose. 
Instead, all it is given are measurements 2;., and controls u;.;. The term “si- 
multaneous localization and mapping” describes the resulting problem: In 
SLAM, the robot acquires a map of its environment while simultaneously lo- 
calizing itself relative to this map. SLAM is significantly more difficult than 
all robotics problems discussed thus far. It is more difficult than localization 
in that the map is unknown and has to be estimated along the way. It is more 
difficult than mapping with known poses, since the poses are unknown and 
have to be estimated along the way. 

From a probabilistic perspective, there are two main forms of the SLAM 
problem, which are both of equal practical importance. One is known as the 
online SLAM problem: It involves estimating the posterior over the momen- 
tary pose along with the map: 


plz, m | Zits U1:t) 


Here zx; is the pose at time t, m is the map, and z1.; and u,.; are the measure- 
ments and controls, respectively. This problem is called the online SLAM 
problem since it only involves the estimation of variables that persist at time 
t. Many algorithms for the online SLAM problem are incremental: they dis- 
card past measurements and controls once they have been processed. The 
graphical model of online SLAM is depicted in Figure 10.1. 
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Figure 10.1 Graphical model of the online SLAM problem. The goal of online SLAM 
is to estimate a posterior over the current robot pose along with the map. 


The second SLAM problem is called the full SLAM problem. In full SLAM, 
we seek to calculate a posterior over the entire path z,.; along with the map, 
instead of just the current pose z; (see also Figure 10.2): 


p(zia,m | Zt) U14) 


This subtle difference between online and full SLAM has ramifications in 
the type of algorithms that can be brought to bear. In particular, the online 
SLAM problem is the result of integrating out past poses from the full SLAM 
problem: 


plz m | 21:5, 1:1) = J [o [ome | Zits Ut) dx, dz... dvi. 


In online SLAM, these integrations are typically performed one-at-a-time. 
They cause interesting changes of the dependency structures in SLAM that 
we will fully explore in the next chapter. 

A second key characteristic of the SLAM problem has to do with the na- 
ture of the estimation problem. SLAM problems possess a continuous and a 
discrete component. The continuous estimation problem pertains to the lo- 
cation of the objects in the map and the robot’s own pose variables. Objects 
may be landmarks in feature-based representation, or they might be object 
patches detected by range finders. The discrete nature has to do with cor- 
respondence: When an object is detected, a SLAM algorithm must reason 
about the relation of this object to previously detected objects. This reason- 
ing is typically discrete: Either the object is the same as a previously detected 
one, or it is not. 
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Figure 10.2 Graphical model of the full SLAM problem. Here, we compute a joint 
posterior over the whole path of the robot and the map. 


We already encountered similar continuous-discrete estimation problems 
in previous chapters. For example, EKF localization in Chapter 7.4 estimates 
the robot pose, which is continuous. But to do so it also estimates the corre- 
spondences of measurements and landmarks in the map, which are discrete. 
In this and the subsequent chapters, we will discuss a number of different 
techniques to deal with the continuous and the discrete aspects of the SLAM 
problem. 

At times, it will be useful to make the correspondence variables explicit, 
as we did in Chapter 7 on localization. The online SLAM posterior is then 
given by 


p(t, M, Ct | Zl: u1:) 
and the full SLAM posterior by 
p(zia, M, C1:t | Zl its Uia) 


The online posterior is obtained from the full posterior by integrating out 
past robot poses and summing over all past correspondences: 


p(z«, m, Ct | 21:4, 014) 
x n s" QS 35 XO pnus mea | 214, 014) d£1 dza ... data 
Ci C2 Ct—1 


In both versions of the SLAM problems—online and full—estimating the full 
posterior (10.4) or (10.5) is the gold standard of SLAM. The full posterior 
captures all there is to be known about the map and the pose or the path. 
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In practice, calculating a full posterior is usually infeasible. Problems arise 
from two sources: (1) the high dimensionality of the continuous parameter 
space, and (2) the large number of discrete correspondence variables. Many 
state-of-the-art SLAM algorithms construct maps with tens of thousands of 
features, or more. Even under known correspondence, the posterior over 
those maps alone involves probability distributions over spaces with 10° or 
more dimensions. This is in stark contrast to localization problems, in which 
posteriors were estimated over three-dimensional continuous spaces. Fur- 
ther, in most applications the correspondences are unknown. The number of 
possible assignments to the vector of all correspondence variables c;:; grows 
exponentially in the time ¢. Thus, practical SLAM algorithms that can cope 
with the correspondence problem must rely on approximations. 

The SLAM problem will be discussed in a number of subsequent chap- 
ters. The remainder of this chapter develops an EKF algorithm for the on- 
line SLAM problem. Much of this material builds on Chapter 3.3, where 
the EKF was introduced, and Chapter 7.4, where we applied the EKF to the 
mobile robot localization problem. We will derive a progression of EKF al- 
gorithms that first apply EKFs to SLAM with known correspondences, and 
then progress to the more general case with unknown correspondences. 


SLAM with Extended Kalman Filters 


Setup and Assumptions 


Historically the earliest—and perhaps the most influential —SLAM algo- 
rithm is based on the extended Kalman filter, or EKF. In a nutshell, the EKF 
SLAM algorithm applies the EKF to online SLAM using maximum likeli- 
hood data association. In doing so, EKF SLAM is subject to a number of 
approximations and limiting assumptions: 

Maps, in EKF SLAM, are feature-based. They are composed of point land- 
marks. For computational reasons, the number of point landmarks is usually 
small (e.g., smaller than 1,000). Further, the EKF approach tends to work 
well the less ambiguous the landmarks are. For this reason, EKF SLAM re- 
quires significant engineering of feature detectors, sometimes using artificial 
beacons as features. 

As any EKF algorithm, EKF SLAM makes a Gaussian noise assumption for 
robot motion and perception. The amount of uncertainty in the posterior 
must be relatively small, since otherwise the linearization in EKFs tend to 
introduce intolerable errors. 
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The EKF SLAM algorithm, just like the EKF localizer discussed in Chap- 
ter 7.4, can only process positive sightings of landmarks. It cannot process 
negative information that arises from the absence of landmarks in sensor 
measurements. This is a direct consequence of the Gaussian belief represen- 
tation and was already discussed in Chapter 7.4. 


SLAM with Known Correspondence 


The SLAM algorithm for the case with known correspondence addresses the 
continuous portion of the SLAM problem only. Its development is in many 
ways parallel to the derivation of the EKF localization algorithm in Chap- 
ter 7.4, but with one key difference: In addition to estimating the robot pose 
x, the EKF SLAM algorithm also estimates the coordinates of all landmarks 
encountered along the way. This makes it necessary to include the landmark 
coordinates into the state vector. 

For convenience, let us call the state vector comprising robot pose and the 
map the combined state vector, and denote this vector y+. The combined vector 
is given by 


Tt 
Ut = 
m 
- 0 T 
= (my Mie My $1 Mo. may S2 ... mN,a mN, SN ) 


Here x, y, and 0 denote the robot's coordinates at time t (not to be confused 
with the state variables x; and y;), mj x, Mi y are the coordinates of the i-th 
landmark, fori = 1,..., N, and s; is its signature. The dimension of this state 
vector is 3N + 3, where N denotes the number of landmarks in the map. 
Clearly, for any reasonable number of N, this vector is significantly larger 
than the pose vector that is being estimated in Chapter 7.4, which introduced 
the EKF localization algorithm. EKF SLAM calculates the online posterior 
py | zia, Ur). 

The EKF SLAM algorithm is depicted in Table 10.1—notice the similarity to 
the EKF localization algorithm in Table 7.2. Lines 2 through 5 apply the mo- 
tion update, whereas lines 6 through 20 incorporate the measurement vector. 

Lines 3 and 5 manipulate the mean and covariance of the belief in accor- 
dance to the motion model. This manipulation only affects those elements 
of the belief distribution concerned with the robot pose. It leaves all mean 
and covariance variables for the map unchanged, along with the pose-map 
covariances. Lines 7 through 20 iterate through all measurements. The test 
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15: 


16: 


17: 
18: 
19: 
20: 
21: 


22: 
23: 


Algorithm EKF SLAM known, correspondences(u; 1, 31-1, Ut, zt, ct): 


1 0 0 0.0 
0 1 0 O--0 

Fz=| yg d 00 
3N 


e COS J4—1,0 — ZÈ cos(pi 1,9 + wc A) 








( = sin J4—1,0 + oe sin(ut—1,9 + wt At) 
vut 
0 0 ae COS J41—1,0 + DL cos(11—1,0 + wt AL) 
Gi, 一 了 十 天 了 0 0 P sin 141,9 + v. sin(ut—1,9 + wt AL) Fy 
0 0 0 
De = Ge Xa GP + FP Ri Fe 
o2 0 0 
Qv = 0 05 0 
0 0 c 
for all observed features zi = (ri $i si)T do 
了 一 ct 
if landmark j never seen before 
lj lta ri cos(ót + fit,a) 
Bjy |=| Bey |+| rt sin(d; + Bee) 
Bj,s Si 0 
endif 
E Oa m Bj Ted Lit,x 
NO N jy — Bty 
q= 676 
| và 
D atan2(ôy, ôx) — fit,e 
Bj,s 
1 0 0 .0.0 0 0 0 0:…0 
0 1 0 0-0 0 0 0 0...0 
0 0 1 0---0 0 0 0 0...0 
Fz j = 0 0 0 0.—.0 1 0 0 0...0 
0 0 0 0.—.0 0 1 0 0...0 
0 0 0 0:0 0 0 1 O---0 
WY WY 
35-3 3N 一 37 
—V/dó. -v55， O + /Gx ôy 0 
Hi = 7 dy —dx —q 一 0y 十 0x 0 zi 
0 0 0 0 0 q 


Ki — Yt HiT (Hi dt HiT +Q)! 
ji = ft + KG (2; — 24) 
Ot = (I — Ki Hi) Se 


endfor 
Ht = ht 
XQ = Nt 


return Ht, Ut 








Table 10.1 The EKF algorithm for the SLAM problem (with known correspon- 


dences). 
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in line 9 returns true only for landmarks for which we have no initial loca- 
tion estimate. For those, line 10 initializes the location of such a landmark by 
the projected location obtained from the corresponding range and bearing 
measurement. As we shall discuss below, this step is important for the lin- 
earization in EKFs; it would not be needed in linear Kalman filters. For each 
measurement, an “expected” measurement is computed in line 14, and the 
corresponding Kalman gain is computed in line 17. Notice that the Kalman 
gain is a matrix of size 3 by 3N + 3. This matrix is usually not sparse. In- 
formation is propagated through the entire state estimate. The filter update 
then occurs in lines 18 and 19, where the innovation is folded back into the 
robot’s belief. 

The fact that the Kalman gain is fully populated for all state variables— 
and not just the observed landmark and the robot pose—is important. In 
SLAM, observing a landmark does not just improve the position estimate 
of this very landmark, but that of other landmarks as well. This effect is 
mediated by the robot pose: Observing a landmark improves the robot pose 
estimate, and as a result it eliminates some of the uncertainty of landmarks 
previously seen by the same robot. The amazing effect here is that we do not 
have to model past poses explicitly—which would put us into the realm of 
the full SLAM problem and make the EKF a non-realtime algorithm. Instead, 
this dependence is captured in the Gaussian posterior, more specifically, in 
the off-diagonal covariance elements of the matrix X;. 

Figure 10.3 illustrates the EKF SLAM algorithm for an artificial example. 
The robot navigates from a start pose that serves as the origin of its coordi- 
nate system. As it moves, its own pose uncertainty increases, as indicated 
by uncertainty ellipses of growing diameter. It also senses nearby landmarks 
and maps them with an uncertainty that combines the fixed measurement 
uncertainty with the increasing pose uncertainty. As a result, the uncer- 
tainty in the landmark locations grows over time. In fact, it parallels that 
of the pose uncertainty at the time a landmark is observed. The interesting 
transition happens in Figure 10.3d: Here the robot observes the landmark it 
saw in the very beginning of mapping, and whose location is relatively well 
known. Through this observation, the robot's pose error is reduced, as indi- 
cated in Figure 10.3d— notice the very small error ellipse for the final robot 
pose! This observation also reduces the uncertainty for other landmarks in 
the map. This phenomenon arises from a correlation that is expressed in 
the covariance matrix of the Gaussian posterior. Since most of the uncer- 
tainty in earlier landmarks is caused by the robot pose, and since this very 
uncertainty persists over time, the location estimates of those landmarks are 
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(a) (b) 








(c) (d) 























Figure 10.3 EKF applied to the online SLAM problem. The robot's path is a dotted 
line, and its estimates of its own position are shaded ellipses. Eight distinguishable 
landmarks of unknown location are shown as small dots, and their location estimates 
are shown as white ellipses. In (a)-(c) the robot's positional uncertainty is increas- 
ing, as is its uncertainty about the landmarks it encounters. In (d) the robot senses 
the first landmark again, and the uncertainty of all landmarks decreases, as does the 
uncertainty of its current pose. Image courtesy of Michael Montemerlo, Stanford 
University. 


correlated. When gaining information on the robot's pose, this information 
spreads to previously observed landmarks. This effect is probably the most 
important characteristic of the SLAM posterior. Information that helps lo- 
calize the robot is propagated through map, and as a result improves the 
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localization of other landmarks in the map. 


Mathematical Derivation of EKF SLAM 


The derivation of the EKF SLAM algorithm for the case with known corre- 
spondences largely parallels that of the EKF localizer in Chapter 7.4. The key 
difference is the augmented state vector, which now includes the locations of 
all landmarks in addition to the robot pose. 

In SLAM, the initial pose is taken to be the origin of the coordinate system. 
This definition is somewhat arbitrary, in that it can be replaced by any coor- 
dinate. None of the landmark locations are known initially. The following 
initial mean and covariance express this belief: 


wo = (000... 0)? 
000 0 0 
000 0 0 
000 0 0 

Xo = 0 0 0 oo 0 
0 0 0 0 .oo 


The covariance matrix is of size (3N +3) x (3N 4-3). It is composed of a small 
3 x 3 matrix of zeros for the robot pose variables. All other covariance values 
are infinite. 

As the robot moves, the state vector changes according to the standard 
noise-free velocity model (see Equations (5.13) and (7.4)). In SLAM, this mo- 
tion model is extended to the augmented state vector: 


=Z sind + 2e sin(0 + G4 At) 
at cos0 — 2 cos(0 + wi At) 
UL NE 十 ^ At 
Ye = -ıt 0 





0 
The variables x, y, and 0 denote the robot pose in y+—1. Because the motion 


only affects the robot's pose and all landmarks remain where they are, only 
the first three elements in the update are non-zero. This enables us to write 
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the same equation more compactly: 
— 5 sind + 2 sin(0 + w,At) 
y = Yt FE T cos — 2 cos(0 + ux At) 
Wt At + Vt At 





Here F, is a matrix that maps the 3-dimensional state vector into a vector of 
dimension 3N + 3. 


3N columns 


The full motion model with noise is then as follows 


— 5 sind +  sin(0 + w,At) 


Wt 


Ye = Wit Fr A cos 0 一 了 cos(0 + w At) +N (0, FT R Fy) 
wAt 
—————— 
g(ut,yi-1) 


where FT R,F, extends the covariance matrix to the dimension of the full 
state vector squared. 

As usual in EKFs, the motion function g is approximated using a first de- 
gree Taylor expansion 


glut, Ye-1) © glut, pii) + Ge (ye—-1 — pia) 


where the Jacobian G, = g'(ui, ut—1) is the derivative of g with respect to 
Ye—1 at uz and Lu, as in Equation (7.7). 

Obviously, the additive form in (10.13) enables us to decompose this Jaco- 
bian into an identity matrix of dimension (3N + 3) x (3N +3) (the derivative 
of y:—1) plus a low-dimensional Jacobian g; that characterizes the change of 
the robot pose: 





Gi = IFI gF; 
with 
0 0 m COS Ht -1,0 + d cOS(L—1,0 + w,At) 
gt = 0 0 ee sin 141,0 + a sin(fir-1,9 + w,At) 
0 0 0 


Plugging these approximations into the standard EKF algorithm gives us 
lines 2 through 5 of Table 10.1. Obviously, several of the matrices multiplied 
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in line 5 are sparse, which should be exploited when implementing this al- 
gorithm. The result of this update are the mean ji; and the covariance 5. of 
the estimate at time t after updating the filter with the control u;, but before 
integrating the measurement z. 

The derivation of the measurement update is similar to the one in Chap- 
ter 7.4. In particular, we are given the following measurement model 





V (m; — 2)? + (my — y? or 0 0 
D o= atan2(m;, —y,mjs —z) —0 | +N(0, 0 o 0 J) 
Mj,s 0 0 o, 
 — MÀ à—— puc er 
h(yt,j) Qt 


Here x, y, and 0 denotes the pose of the robot, i is the index of an individual 
landmark observation in z;, and j = e is the index of the observed landmark 
at time t. The variable r denotes the range to a landmark, ó is the bearing to 
a landmark, and s the landmark signature; the terms or, 7g, and o, are the 
corresponding measurement noise covariances. 

This expression is approximated by the linear function 


hlni) ~ h(üjj) + At (ye — pa) 


Here Hj is the derivative of h with respect to the full state vector y+. Since 
h depends only on two elements of that state vector, the robot pose 2; 
and the location of the j-th landmark m;, the derivative factors into a low- 
dimensional Jacobian hi and a matrix F, j, which maps hj into a matrix of 
the dimension of the full state vector: 


H} = W Ps 


Here hi is the Jacobian of the function h(y:, j) at ji;, calculated with respect 
to the state variables zt and m;: 


lita x la: lt,y Ea Hy 0 lj = lt,x my S Lit 0 











a 0 VA O Vli VU 
hi Hj,y 一 Ht,y Ht,x 一 Hj, 1 Ht — Hj,y Hj,x 一 Ht,x 0 
CH qt qt qt qt 
0 0 0 0 0 1 


The scalar q = (ijs — lits)? + (Ajy — Bt), and as before, j = ci is the 
landmark that corresponds to the measurement zt. The matrix Fy; is of di- 
mension 6 x (3N + 3). It maps the low-dimensional matrix hj into a matrix 
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of dimension 3 x (3N + 3): 


1000-0000 Ores 

0100-0000 O20 

00 1 0---0 0 0 0 0:-:-0 

Fej=| 0000-0100 009890 

0000-0010 0-0 
0000-0001 

一 一 一 \ 一 -一 

3j—3 3N—3j 


These expressions make up for the gist of the Kalman gain calculation in lines 
8 through 17 in our EKF SLAM algorithm in Table 10.1, with one important 
extension. When a landmark is observed for the first time, its initial pose 
estimate in Equation (10.8) leads to a poor linearization. This is because with 
the default initialization in (10.8), the point about which h is being linearized 
is (Aje fy Bj). = (0 0 0)7, which is a poor estimator of the actual land- 
mark location. A better landmark estimator is given in line 10 of Table 10.1. 
Here we initialize the landmark estimate (fi; (ij, [ij,s)’ with the expected 
position. This expected position is derived from the expected robot pose and 
the measurement variables for this landmark 


lj ia ri cos(ót + juo) 
By = ht,y + rt sin(d; + Lto) 
lj,s St 0 


We note that this initialization is only possible because the measurement 
function h is bijective. Measurements are two-dimensional, as are landmark 
locations. In cases where a measurement is of lower dimensionality than the 
coordinates of a landmark, h is a true projection and it is impossible to cal- 
culate a meaningful expectation for (jij,. jj, jij.). from a single measure- 
ment only. This is, for example, the case in computer vision implementations 
of SLAM, since cameras often calculate the angle to a landmark but not the 
range. SLAM is then usually performed by integrating multiple sightings 
and applying triangulation to determine an appropriate initial location esti- 
mate. In the SLAM literature, such a problem is known as bearing only SLAM 
and will be further discussed in one of the exercises (page 334). 

Finally, we note that the EKF algorithm requires memory that is quadratic 
in N, the number of landmarks in the map. Its update time is also quadratic 
in N. The quadratic update complexity stems from the matrix multiplica- 
tions that take place at various locations in the EKF. 
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see next page for continuation 
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13: zx = atan2(dx,y, Ok,x) — jito 
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0 1 0 0-0 0 0 0 O0--0 
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18: endfor 
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21: Ni = max{ N;, j(i)} 
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23: li = fie + Ki (28 — 29) 
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25: endfor 
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28: return ju, X, 











Table 10.2 The EKF SLAM algorithm with ML correspondences, shown here with 
outlier rejection. 
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EKF SLAM with Unknown Correspondences 


The General EKF SLAM Algorithm 


The EKF SLAM algorithm with known correspondences is now extended 
into the general EKF SLAM algorithm, which uses an incremental maximum 
likelihood (ML) estimator to determine correspondences. Table 10.2 depicts 
the algorithm for unknown correspondences. 

Since the correspondence is unknown, the input to the algorithm 
EKF_SLAM lacks a correspondence variable c+. Instead, it includes the mo- 
mentary size of the map, N;-1. The motion update in lines 3 through 6 is 
equivalent to the one in EKF SLAM known correspondences in Table 10.1. 
The measurement update loop, however, is different. Starting in line 8, it 
first creates the hypothesis of a new landmark with index N, + 1; this in- 
dex is one larger than the landmarks in the map at this point in time. The 
new landmark's location is initialized in line 9, by calculating its expected 
location given the estimate of the robot pose and the range and bearing in 
the measurement. Line 9 also assigns the observed signature value to this 
new landmark. Next, various update quantities are then computed in lines 
10 through 18 for all N; + 1 possible landmarks, including the “new” land- 
mark. Line 19 sets the threshold for the creation of a new landmark: A new 
landmark is created if the Mahalanobis distance to all existing landmarks in 
the map exceeds the value o. The ML correspondence is then selected in line 
20. If the measurement is associated with a previously unseen landmark, the 
landmark counter is incremented in line 21, and various vectors and matrices 
are enlarged accordingly—this somewhat tedious step is not made explicit in 
Table 10.2. The update of the EKF finally takes place in lines 23 and 24. The 
algorithm EKF_SLAM returns the new number of landmarks N, along with 
the mean u; and the covariance X+. 

The derivation of this EKF SLAM follows directly from previous deriva- 
tions. In particular, the initialization in line 9 is identical to the one in line 
10in EKF SLAM known correspondences, Table 10.1. Lines 10 through 18 
parallel lines 12 through 17 in EKF SLAM known correspondences, with 
the added variable 7; needed for calculating the ML correspondence. The 
selection of the ML correspondence in line 20, and the definition of the Maha- 
lanobis distance in line 17, is analogous to the ML correspondence discussed 
in Chapter 7.5; in particular, the algorithm EKF localization in Table 7.3 on 
page 217 used an analogous equation to determine the most likely landmark 
(line 16). The measurement updates in lines 23 and 24 of Table 10.2 are also 
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analogous to those in the EKF algorithm with known correspondences, as- 
suming that the participating vectors and matrices are of the appropriate 
dimension in case the map has just been extended. 

Our example implementation of EKF_SLAM can be made more efficient 
by restricting the landmarks considered in lines 10 through 18 to those that 
are near the robot. Further, many of the values and matrices calculated in 
this inner loop can safely be cached away when looping through more than 
one feature measurement vector z}. In practice, a good management of fea- 
tures in the map and a tight optimization of this loop can greatly reduce the 
running time. 


Examples 


Figure 10.4 shows the EKF SLAM algorithm—here with known 
correspondence—applied in simulation. The left panel of each of the 
three diagrams plots the posterior distributions, marginalized for the indi- 
vidual landmarks and the robot pose. The right side depicts the correlation 
matrix for the augmented state vector y+; the correlation is the normalized 
covariance. As is easily seen from the result in Figure 10.4c, over time all 
x- and y-coordinate estimates become fully correlated. This means the map 
becomes known in relative terms, up to an uncertain global location that 
cannot be reconciled. This highlights an important characteristic of the 
SLAM problem: The absolute coordinates of a map relative to the coordinate 
system defined by the initial robot pose can only be determined in approxi- 
mation, whereas the relative coordinates can be determined asymptotically 
with certainty. 

In practice, EKF SLAM has been applied successfully to a large range of 
navigation problems, involving airborne, underwater, indoor, and various 
other vehicles. Figure 10.5 shows an example result obtained using the un- 
derwater robot Oberon, developed at the University of Sydney, Australia, 
and shown in Figure 10.6. This vehicle is equipped with a pencil sonar, a 
sonar that can scan at very high resolutions and detect obstacles up to 50 
meters away. To facilitate the mapping problem, researchers have deposited 
long, small vertical objects in the water, which can be extracted from the 
sonar scans with relative ease. In this specific experiment, there is a row of 
such objects, spaced approximately 10 meters apart. In addition, a more dis- 
tant cliff offers additional point features that can be detected using the pencil 
sonar. 

In the experiment shown in Figure 10.5, the robot moves by these land- 
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(a) 


(b) 


(c) 





Figure 10.4 EKF SLAM with known data association in a simulated environment. 
The map is shown on the left, with the gray-level corresponding to the uncertainty 
of each landmark. The matrix on the right is the correlation matrix, which is the 
normalized covariance matrix of the posterior estimate. After some time, all z- and 
all y-coordinate estimates become fully correlated. 
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Estimated Path of the Vehicle 
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Figure 10.5 Example of Kalman filter estimation of the map and the vehicle pose. 
Image courtesy of Stefan Williams and Hugh Durrant-Whyte, Australian Centre for 
Field Robotics. 





Figure 10.6 Underwater vehicle Oberon, developed at the University of Sydney. 
Image courtesy of Stefan Williams and Hugh Durrant-Whyte, Australian Centre for 
Field Robotics. 
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(a) RWI B21 Mobile robot and testing environment 
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Figure 10.7 (a) The MIT B21 mobile robot in a calibrated testing facility. (b) Raw 
odometry of the robot, as it is manually driven through the environment. (c) The 
result of EKF SLAM is a highly accurate map. The image shows the estimated map 
overlayed on a manually constructed map. All images and results are courtesy of 
John Leonard and Matthew Walter, MIT. 


marks, then turns around and moves back. While doing so, it measures and 
integrates landmarks into its map using the EKF SLAM algorithm described 
in this chapter. 

The map shown in Figure 10.5 shows the robot's path, marked by the trian- 
gles connected by a line. Around each triangle one can see an ellipse, which 
corresponds to the covariance matrix of the Kalman filter estimate projected 
into the robot's z-y position. The ellipse shows the variance; the larger it 
is, the less certain the robot is about its current pose. Various small dots in 
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Figure 10.5 show landmark sightings, obtained by searching the sonar scan 
for small and highly reflective objects. The majority of these sightings are 
rejected, using a mechanism described in the next section. However, some 
are believed to correspond to a landmark and are added to the map. At the 
end of the run, the robot has classified 14 such objects as landmarks, each of 
which is plotted with the projected uncertainty ellipse in Figure 10.5. These 
landmarks include the artificial landmarks put out by the researchers, but 
they also include various other terrain features in the vicinity of the robot. 
The residual pose uncertainty is small. 

Figure 10.7 shows the result of another EKF SLAM implementation. Panel 
(a) shows MIT’s RWI B21 mobile robot, situated in a testing environment. 
The testing environment is a tennis court; obstacles are hurdles whose po- 
sition was measured manually with centimeter accuracy for evaluation pur- 
poses. Panel (b) of Figure 10.7 shows the raw odometry path. The result of 
EKF SLAM is shown in Panel (c), overlayed with the manually constructed 
map. The reader should verify that this is indeed an accurate map. 


Feature Selection and Map Management 


Making EKF SLAM robust in practice often requires additional techniques 
for map management. Many of them pertain to the fact that the Gaussian noise 
assumption is unrealistic, and many spurious measurements occur in the far 
tail end of the noise distribution. Such spurious measurements can cause the 
creation of fake landmarks in the map which, in turn, negatively affect the 
localization of the robot. 

Many state-of-the-art techniques possess mechanisms to deal with outliers 
in the measurement space. Such outliers are defined as spurious landmark 
sightings outside the uncertainty range of any landmark in the map. The 
most simple technique to reject such outliers is to maintain a provisional land- 
mark list. Instead of augmenting the map by a new landmark once a mea- 
surement indicates the existence of a new landmark, such a new landmark 
is first added to a provisional list of landmarks. This list is just like the map, 
but landmarks on this list are not used to adjust the robot pose (the corre- 
sponding gradients in the measurement equations are set to zero). Once 
a landmark has consistently been observed and its uncertainty ellipse has 
shrunk, it is transitioned into the regular map. 

In practical implementations, this mechanism tends to reduce the number 
of landmarks in the map by a significant factor, while still retaining all phys- 
ical landmarks with high probability. A further step, also commonly found 
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in state-of-the-art implementations, is to maintain a landmark existence prob- 
ability. Such a posterior probability may be implemented as log odds ratio 
and be denoted oj for the j-th landmark in the map. Whenever the j-th land- 
mark m; is observed, o; is incremented by a fixed value. Not observing m; 
when it would be in the perceptual range of the robot's sensors leads to a 
decrement of o;. Since it can never be known with certainty whether a land- 
mark is within a robot's perceptual range, the decrement may factor in the 
probability of such an event. Landmarks are removed from the map when 
the value oj drops below a threshold. Such techniques lead to much leaner 
maps in the face of non-Gaussian measurement noise. 

When initializing the estimate for a new landmark starting with a covari- 
ance with very large elements—as suggested in Equation (10.9) 一 may induce 
numerical instabilities. This is because the very first covariance update step 
will change this value by several orders of magnitude, too many perhaps for 
generating a matrix that is still positive definite. A better strategy involves an 
explicit initialization step for any feature that has not been observed before. 
In particular, such a step would initialize the covariance Y; directly with the 
actual landmark uncertainty, instead of executing line 24 in Table 10.2 (same 
with the mean in line 23). 

As noted previously, the maximum likelihood approach to data associa- 
tion has a clear limitation, which arises from the fact that the maximum like- 
lihood approach deviates from the idea of full posterior estimation in prob- 
abilistic robotics. Instead of maintaining a joint posterior over augmented 
states and data associations, it reduces the data association problem to a de- 
terministic determination, which is treated as if the maximum likelihood as- 
sociation was always correct. This limitation makes EKF brittle with regards 
to landmark confusion, which in turn can lead to wrong results. In practice, 
researchers often remedy the problem by choosing one of the following two 
methods, both of which reduce the chances of confusing landmarks: 


* Spatial arrangement. The further apart landmarks are, the smaller the 
chance to accidentally confuse them. It is therefore common practice to 
choose landmarks that are sufficiently far away from each other so that 
the probability of confusing one with another becomes small. This intro- 
duces an interesting trade-off: a large number of landmarks increases the 
danger of confusing them. Too few landmarks makes it more difficult to 
localize the robot, which in turn also increases the chances of confusing 
landmarks. Little is currently known about the optimal density of land- 
marks, and researchers often use intuition when selecting specific land- 
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marks. 


。 Signatures. When selecting appropriate landmarks, it is essential to max- 
imize the perceptual distinctiveness of landmarks. For example, doors 
might possess different colors, or corridors might have different widths. 
The resulting signatures are essential for successful SLAM. 


With these additions, the EKF SLAM algorithm has indeed been applied suc- 
cessfully to a wide range of practical mapping problems, involving robotic 
vehicles in the air, on the ground, and underwater. 

A key limitation of EKF SLAM lies in the necessity to select appropriate 
landmarks. By reducing the sensor stream to the presence and absence of 
landmarks, a lot of sensor data is usually discarded. This results in an in- 
formation loss relative to a SLAM algorithm that can exploit sensors without 
extensive pre-filtering. Further, the quadratic update time of the EKF limits 
this algorithm to relatively scarce maps with less than 1,000 features. In prac- 
tice, one often seeks maps with 10° features or more, in which case the EKF 
ceases to be applicable. 

The relatively low dimensionality of the map tends to create a harder data 
association problem. This is easily verified: When you open your eyes and 
look at the full room you are in, you probably have no difficulty in recog- 
nizing where you are! However, if you are only told the location of a small 
number of landmarks—e.g., the location of all light sources—the decision is 
much harder. As a result, data association in EKF SLAM is more difficult 
than in some of the SLAM algorithms discussed in subsequent chapters, and 
capable of handling orders of magnitude more features. This culminates into 
the fundamental dilemma of EKF SLAM: While incremental maximum likeli- 
hood data association might work well with dense maps with hundreds of 
millions of features, it tends to be brittle with scarce maps. However, EKFs 
require sparse maps because of the quadratic update complexity. In subse- 
quent chapters, we will discuss SLAM algorithms that are more efficient and 
can handle much larger maps. We will also discuss more robust data as- 
sociation techniques. For its many limitations, the value of the EKF SLAM 
algorithm presented in this chapter is mostly historical. 


Summary 


This chapter described the general SLAM problem and introduced the EKF 
approach. 
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* The SLAM problem is defined as a concurrent localization and mapping 
problem, in which a robot seeks to acquire a map of the environment 
while simultaneously seeking to localize itself relative to this map. 


* The SLAM problem comes in two versions: online and global. Both prob- 
lems involve the estimation of the map. The online problem seeks to es- 
timate the momentary robot pose, whereas the global problem seeks to 
determine all poses. Both problems are of equal importance in practice, 
and have found equal coverage in the literature. 


* The EKF SLAM algorithm is arguably the earliest SLAM algorithm. It 
applies the extended Kalman filter to the online SLAM problem. With 
known correspondences, the resulting algorithm is incremental. Updates 
require time quadratic in the number of landmarks in the map. 


* When correspondences are unknown, the EKF SLAM algorithm applies 
an incremental maximum likelihood estimator to the correspondence 
problem. The resulting algorithm works well when landmarks are suf- 
ficiently distinct. 


* Additional techniques were discussed for managing maps. Two common 
strategies for identifying outliers include a provisional list for landmarks 
that are not yet observed sufficiently often, and a landmark evidence 
counter that calculates the posterior evidence of the existence of a land- 
mark. 


* EKF SLAM has been applied with considerable success in a number of 
robotic mapping problems. Its main drawback is the need for sufficiently 
distinct landmarks, and the computational complexity required for up- 
dating the filter. 


In practice, EKF SLAM has been applied with some success. When land- 
marks are sufficiently distinct, the approach approximates the posterior well. 
The advantage of calculating a full posterior are manifold: It captures all 
residual uncertainty and enables the robot to reason about its control tak- 
ing its true uncertainty into account. However, the EKF SLAM algorithm 
suffers from its enormous update complexity, and the limitation to sparse 
maps. This, in turn, makes the data association problem a difficult one, and 
EKF SLAM tends to work poorly in situations where landmarks are highly 
ambiguous. Further brittleness is due to the fact that the EKF SLAM algo- 
rithm relies on an incremental maximum likelihood data association tech- 
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nique. This technique makes it impossible to revise past data associations, 
and can induce failure when the ML data association is incorrect. 

The EKF SLAM algorithm applies to the online SLAM problem; it is in- 
applicable to the full SLAM problem. In the full SLAM problem, the addi- 
tion of a new pose to the state vector at each time step would make both 
the state vector and the covariance grow without bounds. Updating the co- 
variance would therefore require an ever-increasing amount of time, and the 
approach would quickly run out of computational time no matter how fast 
the processor. 


Bibliographical Remarks 


The problem of SLAM predates the invention of modern robots by many centuries. The prob- 
lem of modeling a physical structure from a moving sensor platform is at the core of a number 
of fields, such as geosciences, photogrammetry, and computer vision. Many of the mathematical 
techniques that form the core of the SLAM work today were first developed for calculating plan- 
etary orbits. For example, the least squares method can be traced back to Johann Carl Friedrich 
Gauss (1809). SLAM is essentially a geographic surveying problem. Teleported to a robot it 
creates challenges that human surveyors rarely face, such as the correspondence problem and 
the problem of finding appropriate features. 

In robotics, the EKF to the SLAM problem was introduced through a series of seminal papers 
by Cheeseman and Smith (1986); Smith and Cheeseman (1986); Smith et al. (1990). These papers 
were the first to describe the EKF approach discussed in this chapter. Just like us in this book, 
Smith et al. discussed the EKF in the context of feature-based mapping with point landmarks, 
and known data association. The first implementations of EKF SLAM were due to Moutarlier 
and Chatila (1989a,b) and Leonard and Durrant-Whyte (1991), some using artificial beacons as 
landmarks. The EKF became fashionable at a time when many authors investigated alternative 
techniques for maintaining accurate pose estimates during mapping (Cox 1991). Early work 
by Dickmanns and Graefe (1988) on estimation road curvature in autonomous cars is highly 
related; see (Dickmanns 2002) for a survey. 

SLAM is a highly active field of research, as a recent workshop indicates (Leonard et al. 
2002b). An extensive literature for the field of SLAM—or CML for concurrent mapping and lo- 
calization as Leonard and Durrant-Whyte (1991) call it—can be found in Thrun (2002). The 
importance of maintaining correlations in the map was pointed out by Csorba (1997), who in 
his Ph.D. thesis, who also established some basic convergence results. Since then, a number of 
authors have extended the basic paradigm in many different ways. The feature management 
techniques described in this chapter are due to Dissanayake et al. (2001, 2002); see also Bailey 
(2002). Williams et al. (2001) developed the idea of provisional feature lists in SLAM, to reduce 
the effect of feature detection errors. Feature initialization is discussed in Leonard et al. (2002a), 
who explicitly maintains an estimate of previous poses to accommodate sensors that provide 
incomplete data on feature coordinates. A representation that avoids singularities by explicitly 
factoring “perturbations” out of the posterior was devised by Castellanos et al. (1999), who re- 
ports improved numerical stability over the vanilla EKF. Jensfelt et al. (2002) found significant 
improvements in indoor SLAM when utilizing basic geometric constraints, such as the fact that 
most walls are parallel or orthogonal. Early work on SLAM with sonars goes back to Rencken 
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(1993); a state-of-the-art system for SLAM with sonar sensors can be found in Tardós et al. (2002). 
Castellanos et al. (2004) provided a critical consistency analysis for the EKF. An empirical com- 
parison of multiple algorithms can be found in Vaganay et al. (2004). A few open questions are 
discussed by Salichs and Moreno (2000). Research on the important data association problem 
will be reviewed in a later chapter (see page 481). 

As noted, a key limitation of the EKF solution to the SLAM problem lies in the quadratic 
nature of the covariance matrix. This "flaw" has not remained unnoticed. In the past few years, 
a number of researchers have proposed EKF SLAM algorithms that gain remarkable scalability 
through decomposing the map into submaps, for which covariances are maintained separately. 
Some of the original work in this field is by Leonard and Feder (1999), Guivant and Nebot 
(2001), and Williams (2001). Leonard and Feder's (1999) decoupled stochastic mapping algorithm 
decomposes the map into collections of smaller, more manageable submaps. This approach is 
computationally efficient, but does not provide a mechanism to propagate information through 
the network of local maps (Leonard and Feder 2001). Guivant and Nebot (2001, 2002), in con- 
trast, provided an approximate factorization of the covariance matrix which reduced the actual 
complexity of EKF updating by a significant factor. Williams (2001) and Williams et al. (2002) 
proposed the constrained local submap filter (CLSF), which relies on creating independent local 
submaps of the features in the immediate vicinity of the vehicle. Williams et al. (2002) provides 
results for underwater mapping (see Figure 10.5 for some of his early work). The sequential 
map joining techniques described in Tardós et al. (2002) is a related decomposition. Bailey (2002) 
devises a similar technique for representing SLAM maps hierarchically. Folkesson and Chris- 
tensen (2003) describes a technique by which frequent updates are limited to a small region 
near the robot, whereas the remainder of the map is updated at much lower frequencies. All 
these techniques achieve the same rate of convergence as the full EKF solution, but incur an 
O(n?) computational burden. However, they scale much better to large problems with tens of 
thousands of features. 

A number of researchers have developed hybrid SLAM techniques, which combine EKF-style 
SLAM techniques with volumetric techniques, such as occupancy grid maps. The hybrid metric 
map (HYMM) by Guivant et al. (2004) and Nieto et al. (2004) decomposes maps into triangular 
regions (LIRs) using volumetric maps such as occupancy grid maps as a basic representation 
for those regions. These local maps are combined using EKFs. Burgard et al. (1999b) also de- 
composes maps into local occupancy grid maps, but uses the expectation maximization (EM) al- 
gorithm (see Dempster et al. (1977)) for combining local maps into a joint global map. The work 
by Betgé-Brezetz et al. (1995, 1996) integrated two types of representations into a SLAM frame- 
work: bitmaps for representing outdoor terrain, and object representations for sparse outdoor 
objects. 

Extensions of SLAM to dynamic environments can be found in Wang et al. (2003), Hahnel 
et al. (2003c), and Wolf and Sukhatme (2004). Wang et al. (2003) developed an algorithm called 
SLAM with DATMO, short for SLAM with the detection and tracking of moving objects. Their 
approach is based on the EKF, but it allows for the possible motion of features. Háhnel et al. 
(2003c) studied the problem of performing SLAM in environments with many moving objects. 
They successfully employed the EM algorithm for filtering out measurements that likely corre- 
spond to moving objects. By doing so, they were able to acquire maps in environments where 
conventional SLAM techniques failed. The approach in Wolf and Sukhatme (2004) maintains 
two coupled occupancy grids of the environment, one for the static map, and one for moving 
objects. SLAM-style localization is achieved by a regular landmark-based SLAM algorithm. 

SLAM systems have been brought to bear in a number of deployed systems. Rikoski et al. 
(2004) applied SLAM to sonar odometry of submarine, providing a new approach for "auditory 
odometry." SLAM in abandoned mines is described in Nüchter et al. (2004), who extended the 
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paradigm to full 6-D pose estimation. Extensions to the multi-robot SLAM problem have been 
proposed by a number of researchers. Some of the earlier work is by Nettleton et al. (2000), 
who developed a technique by which vehicles maintain local EKF maps, but fuse them using 
the information representation of the posterior. An alternative technique is due to Rekleitis et al. 
(2001a), who use a team of stationary and moving robots to reduce the localization error when 
performing SLAM. Fenwick et al. (2002) provides a comprehensive theoretical investigation of 
multi vehicle map fusion, specifically for landmark based SLAM. Techniques for fusing scans 
were developed in Konolige et al. (1999); Thrun et al. (2000b); Thrun (2001). 

A number of researchers have developed SLAM systems for specific sensor types. An im- 
portant sensor is a camera; however, cameras only provide bearing to features. This problem 
is well-studied in the computer vision literature as structure from motion (SFM) (Tomasi and 
Kanade 1992; Soatto and Brockett 1998; Dellaert et al. 2003), and in the field of photogramme- 
try (Konecny 2002). Within SLAM, the seminal work on bearing only SLAM is due to Deans and 
Hebert (2000, 2002). Their approach recursively estimates features of the environment that are 
invariant to the robot pose, so as to decouple the pose error from the map error. A great num- 
ber of researchers has applied SLAM using cameras as the primary sensor (Neira et al. 1997; 
Cid et al. 2002; Davison 2003). Davison (1998) provides active vision techniques in the context 
of SLAM. Dudek and Jegessur’s (2000) work relies on place recognition based on appearance, 
whereas Hayet et al. (2002) and Bouguet and Perona (1995) use visual landmarks. Diebel et al. 
(2004) developed a filter for SLAM with an active stereo sensor that accounts for the nonlinear 
noise distribution of a stereo range finder. Sensor fusion techniques for SLAM were developed 
by Devy and Bulata (1996). Castellanos et al. (2001) found empirically that fusing laser and 
camera outperformed each sensor modality in isolation. 

SLAM has also been extended to the problem of building dense 3-D models. Early systems 
for acquiring 3-D models with indoor mobile robots can be found in Reed and Allen (1997); 
Iocchi et al. (2000); Thrun et al. (2004b). Devy and Parra (1998) acquire 3-D models using para- 
metric curves. Zhao and Shibasaki (2001), Teller et al. (2001), and Frueh and Zakhor (2003) have 
developed impressive systems for building large textured 3-D maps of urban environments. 
Neither of these systems addresses the full SLAM problem due to the availability of outdoor 
GPS, but they are highly related to the mathematical basis of SLAM. These techniques blend 
smoothly with a rich body of work on aerial reconstruction of urban environments (Jung and 
Lacroix 2003; Thrun et al. 2003). 

The following chapters discuss alternatives to the plain EKFs. The techniques described there 
share many of the intuitions with the extensions discussed here, as the boundary between dif- 
ferent types of filters has become almost impossible to draw. The literature review will be con- 
tinued after the next chapter, when discussing a SLAM algorithm using information-theoretic 
representations. 


Exercises 


1. What is the computational complexity of the motion update in EKF 
SLAM? Use the O( ) notation. Compare this with the worst case com- 
plexity for EKFs over a feature vector of the same size. 


2. Bearing only SLAM refers to the SLAM problem when the sensors can only 
measure the bearing of a landmark but not its range. As noted, bearing 
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only SLAM is closely related to Structure from Motion (SFM) in Computer 
Vision. One problem in bearing only SLAM with EKFs concerns the ini- 
tialization of landmark location estimates, even if the correspondences are 
known. Discuss why, and devise a technique for initializing the landmark 
location estimates (means and covariances) that can be applied in bearing 
only SLAM. 


3. On page 329, we remarked that the EKF algorithm in Table 10.2 can be- 
come numerically instable. Devise a method for setting jj, and X; directly 
when a new feature is observed for the first time. Such a technique would 
not require initializing the covariance with very large values. Show that 
the result is mathematically equivalent to lines 23 and 24 when the covari- 
ance is initialized as in Equation (10.9). 


4. The text suggests using a binary Bayes filter to compute the probability 
that a landmark represented in the posterior actually exists in the physical 
world. 


(a) In the first part of this exercise, you are asked to design such a binary 
Bayes filter. 


(b) Now extend your filter to a situation in which landmarks sporadically 
disappear with a probability p*. 


(c) Picture a situation where for a well-established landmark, no informa- 
tion is received for a long time with regards to its existence (no positive 
and no negative information). To what value will your filter converge? 
Prove your answer. 


5. The EKF SLAM algorithm, as presented here, is unable to cope with the 
data association problem in a sound statistical way. Lay out an algorithm 
(and a statistical framework) for posterior estimation with unknown data 
association that represents the posterior by mixtures of Gaussians, and 
characterize its advantages and disadvantages. How does the complexity 
of the posterior grow over time? 


6. Based on the previous problem, develop an approximate method for pos- 
terior estimation with unknown data association where the time needed 
for each incremental update step does not grow over time (assuming a 
fixed number of landmarks). 


7. Develop a Kalman filter algorithm that uses local occupancy grid maps as 
its basic components, instead of landmarks. Among other things, prob- 
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lems that have to be solved are how to relate local grids to each other, and 
how to deal with the ever-growing number of local grids. 
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The GraphSLAM Algorithm 


Introduction 


The EKF SLAM algorithm described in the previous chapter is subject to a 
number of limitations. One of them is its quadratic update complexity; an- 
other is the linearization technique in EKFs, which is only performed once in 
the EKF for each nonlinear term. In this chapter, we introduce an alternative 
SLAM algorithm, called GraphSLAM. In contrast to the EKF, GraphSLAM 
solves the full SLAM problem. It calculates a solution for the offline problem 
defined over all poses and all features in the map. As we will show in this 
chapter, the posterior of the full SLAM problem naturally forms a sparse graph. 
This graph leads to a sum of nonlinear quadratic constraints. Optimizing 
these constraints yields a maximum likelihood map and a corresponding set 
of robot poses. Historically, this idea can be found ina large number of SLAM 
publications. The name “GraphSLAM” has been chosen because it captures 
the essence of this approach. 

Figure 11.1 illustrates the GraphSLAM algorithm. Shown there is the 
graph that GraphSLAM extracts from five poses labeled xo,...,x%4, and two 
map features mı, m». Arcs in this graph come in two types: motion arcs and 
measurement arcs. Motion arcs link any two consecutive robot poses, and 
measurement arcs link poses to features that were measured there. Each edge 
in the graph corresponds to a nonlinear constraint. As we shall see later, 
these constraints represent the negative log likelihood of the measurement 
and the motion models, hence are best thought of as information constraints. 
Adding such a constraint to the graph shall prove to be trivial for Graph- 
SLAM; it involves no significant computation. The sum of all constraints 
results in a nonlinear least squares problem, as stated in Figure 11.1. 

To compute a map posterior, GraphSLAM linearizes the set of constraints. 
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Figure 11.1 GraphSLAM illustration, with 4 poses and two map features. Nodes 
in the graphs are robot poses and feature locations. The graph is populated by two 
types of edges: Solid edges link consecutive robot poses, and dashed edges link poses 
with features sensed while the robot assumes that pose. Each link in GraphSLAM is a 
non-linear quadratic constraint. Motion constraints integrate the motion model; mea- 
surement constraints the measurement model. The target function of GraphSLAM is 
the sum of these constraints. Minimizing it yields the most likely map and the most 
likely robot path. 


The result of linearization is an information matrix and an information vector 
of essentially the same form as already encountered in Chapter 3, when we 
discussed the information filter. However, the information matrix inherits 
the sparseness from the graph constructed by GraphSLAM. This sparseness 
enables GraphSLAM to apply the variable elimination algorithm, thereby 
transforming the graph into a much smaller one only defined over robot 
poses. The path posterior map is then calculated using standard inference 
techniques. GraphSLAM also computes a map and certain marginal posteri- 
ors over the map; the full map posterior is of course quadratic in the size of 
the map and hence is usually not recovered. 

In may ways, EKF SLAM and GraphSLAM are extreme ends of the spec- 
trum of SLAM algorithms. A primary difference between EKF SLAM and 
GraphSLAM pertains to the representation of information. "While EKF 
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SLAM represents information through a covariance matrix and a mean vec- 
tor, GraphSLAM represents the information as a graph of soft constraints. 
Updating the covariance in an EKF is computationally expensive; whereas 
growing the graph is cheap! 

Such savings come at a price. GraphSLAM requires additional inference 
when recovering the map and the path, whereas EKF maintains its best esti- 
mate of the map and the robot pose at all times. The build-up of the graph 
is followed by a separate computational phase in which this information is 
transformed into an estimate of the state. No such phase is required for EKF 
SLAM. 

Consequently, one may think of EKF as a proactive SLAM algorithm, in 
the sense that it resolves any new piece of information immediately into an 
improved estimate of the state of the world. GraphSLAM, in contrast, is 
more like a lazy SLAM technique, which simply accumulates information 
into its graph without resolving it. This difference is significant. GraphSLAM 
can acquire maps that are many orders of magnitude larger than EKFs can 
handle. 

There are further differences between EKF SLAM and GraphSLAM. As a 
solution to the full SLAM problem, GraphSLAM calculates posteriors over 
robot paths, hence is not an incremental algorithm. This approach is differ- 
ent from EKF SLAM, which, as a filter, only maintains a posterior over the 
momentary pose of the robot. EKF SLAM enables a robot to update its map 
forever, whereas GraphSLAM is best suited for problems where one seeks a 
map from a data set of fixed size. EKF SLAM can maintain a map over the 
entire lifetime of a robot without having to worry about the total number of 
time steps elapsed since the beginning of data acquisition. 

Because GraphSLAM has access to the full data when building the map, 
it can apply improved linearization and data association techniques. In EKF 
SLAM, the linearization and the correspondence for a measurement at time t 
are calculated based on the data up to time t. In GraphSLAM all data can be 
used to linearize and to calculate correspondences. Put differently, Graph- 
SLAM can revise past data association, and it can linearize more than once. 
In fact, GraphSLAM iterates the three crucial steps in mapping: the con- 
struction of the map, the calculation of correspondence variables, and the 
linearization of the measurement and motion models—so as to obtain the 
best estimate of all of those quantities. As a result of all this, GraphSLAM 
tends to produce maps that are superior in accuracy to maps generated by 
EKFs. 

However, GraphSLAM is not without limitations when compared to the 
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EKF approach. One was already discussed: The size of the graph grows 
linearly over time, whereas the EKF shows no such time dependence in the 
amount of memory allocated to its estimate. Another pertains to data associ- 
ation. Whereas in EKF SLAM data association probabilities can easily be ob- 
tained from the posterior’s covariance matrix, computing the same probabil- 
ities in GraphSLAM requires inference. This difference will be elucidated be- 
low, where we define an explicit algorithm for computing correspondences 
in GraphSLAM. Thus, which method is preferable is very much a question 
of the application, as there is no single method that would be superior in all 
critical dimensions. 

This chapter first describes the intuition behind GraphSLAM and its basic 
updates steps. We then derive the various update steps mathematically and 
prove its correctness relative to specific linear approximations. A technique 
for data association will also be devised, followed by a discussion of actual 
implementations of the GraphSLAM algorithm. 


Intuitive Description 


The basic intuition behind GraphSLAM is remarkably simple: GraphSLAM 
extracts from the data a set of soft constraints, represented by a sparse graph. 
It obtains the map and the robot path by resolving these constraints into a 
globally consistent estimate. The constraints are generally nonlinear, but in 
the process of resolving them they are linearized and transformed into an in- 
formation matrix. Thus, GraphSLAM is essentially an information-theoretic 
technique. We will describe GraphSLAM both as a technique for building a 
sparse graph of nonlinear constraints, and as a technique for populating a 
sparse information matrix of linearized constraints. 


Building Up the Graph 


Suppose we are given a set of measurements z4., with associated correspon- 
dence variables c1:+, and a set of controls u1:+. GraphSLAM turns this data 
into a graph. The nodes in the graph are the robot poses z;.; and the features 
in the map m = (m;). Each edge in the graph corresponds to an event: a 
motion event generates an edge between two robot poses, and a measure- 
ment event creates a link between a pose and a feature in the map. Edges 
represent soft constraints between poses and features in GraphSLAM. 

For a linear system, these constraints are equivalent to entries in an infor- 
mation matrix and an information vector of a large system of equations. As 
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usual, we will denote the information matrix by €? and the information vec- 
tor by €. As we shall see below, each measurement and each control leads to 
a local update of Q and £, which corresponds to a local addition of an edge 
to the graph in GraphSLAM. In fact, the rule for incorporating a control or a 
measurement into Q and € is a local addition, paying tribute to the important 
fact that information is an additive quantity. 

Figure 11.2 illustrates the process of constructing the graph along with the 
corresponding information matrix. First consider a measurement z}. This 
measurement provides information between the location of the feature j — cj 
and the robot pose zt at time t. In GraphSLAM, this information is mapped 
into a constraint between x, and m;. We can think of this edge as a “spring” 
in a spring-mass model. As we shall see below, the constraint is of the type: 


(zt — h(xe,my))" Qe" (t — hlen m;) 


Here h is the familiar measurement function, and Q; is the covariance of the 
measurement noise. Figure 112a shows the addition of such a link into the 
graph maintained by GraphSLAM. 

Now consider robot motion. The control u; provides information about 
the relative value of the robot pose at time ¢— 1 and the pose at time t. Again, 
this information induces a constraint in the graph, which will be of the form 


(zi — g(ui,241—1))*. Rt (we — glut, 2021) 


Here g is the familiar kinematic motion model of the robot, and R; is the 
covariance of the motion noise. 

Figure 112b illustrates the addition of such a link in the graph. It also 
shows the addition of a new element in the information matrix, between the 
pose x, and the measurement z}. This update is again additive. As before, 
the magnitude of these values reflects the residual uncertainty R, due to the 
measurement noise; the less noisy the sensor, the larger the value added to (? 
and €. 

After incorporating all measurements 2,4 and controls u;4, we obtain a 
sparse graph of soft constraints. The number of constraints in the graph is 
linear in the time elapsed, hence the graph is sparse. The sum of all con- 
straints in the graph will be of the form 


JarphSLAM = XQ Q% Xo 十 24 az, — g(ue,@1-1))” Ri (az — g(ue, 21-1) 


十 26 — hl, é) Qr! (5 — h(ys ei) 
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(a) Observation ls landmark m1 


m, 
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(b) Robot motion from z to £2 








(c) Several steps later 








Figure 11.2 Illustration of the acquisition of the information matrix in GraphSLAM. 
The left diagram shows the dependence graph, the right the information matrix. 
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It is a function defined over pose variables x1., and all feature locations in 
the map m. Notice that this expression also features an anchoring constraint 
of the form z7 Qo zo. This constraint anchors the absolute coordinates of the 
map by initializing the very first pose of the robot as (0 0 0)T. 

In the associated information matrix €), the off-diagonal elements are all 
zero with two exceptions: Between any two consecutive poses x;_; and 2; 
will be a non-zero value that represents the information link introduced by 
the control u;. Also non-zero will be any element between a map feature m; 
and a pose x+, if m; was observed when the robot was at x+. All elements 
between pairs of different features remain zero. This reflects the fact that 
we never received information pertaining to their relative location—all we 
receive in SLAM are measurements that constrain the location of a feature 
relative to a robot pose. Thus, the information matrix is equally sparse; all 
but a linear number of its elements are zero. 


Inference 


Of course, neither the graph representation nor the information matrix rep- 
resentation gives us what we want: the map and the path. In GraphSLAM, 
the map and the path are obtained from the linearized information matrix, 
via u = Q'E (see Equation (3.73) on page 72). This operation requires us to 
solve a system of linear equations. This raises the question on how efficiently 
we can recover the map estimate j and the covariance X. 

The answer to the complexity question depends on the topology of the 
world. If each feature is seen only locally in time, the graph represented 
by the constraints is linear. Thus, Q can be reordered so that it becomes a 
band-diagonal matrix, and all non-zero values occur near its diagonal. The 
equation u = ()^!£ can then be computed in linear time. This intuition car- 
ries over to cycle-free world that is traversed once, so that each feature is seen 
for a short, consecutive period of time. 

The more common case, however, involves features that are observed mul- 
tiple times, with large time delays in between. This might be the case because 
the robot goes back and forth through a corridor, or because the world pos- 
sesses cycles. In either situation, there will exist features m; that are seen at 
drastically different time steps z+; and z;,, with t2 >> tı. In our constraint 
graph, this introduces a cyclic dependence: x;, and x, are linked through 
the sequence of controls ut 41, ut, 42, ..., ut, and through the joint observa- 
tion links between z+, and mj, and z+, and m;, respectively. Such links make 
our variable reordering trick inapplicable, and recovering the map becomes 
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more complex. In fact, since the inverse of Q is multiplied with a vector, 
the result can be computed with optimization techniques such as conjugate 
gradient, without explicitly computing the full inverse matrix. Since most 
worlds possess cycles, this is the case of interest. 

The GraphSLAM algorithm now employs an important factorization trick, 
which we can think of as propagating information through the information 
matrix (in fact, it is a generalization of the well-known variable elimination 
algorithm for matrix inversion). Suppose we would like to remove a feature 
mj from the information matrix €? and the information state €. In our spring 
mass model, this is equivalent to removing the node and all springs attached 
to this node. As we shall see below, this is possible by a remarkably simple 
operation: We can remove all those springs between m; and the poses at 
which m; was observed, by introducing new springs between any pair of 
such poses. 

This process is illustrated in Figure 11.3, which shows the removal of two 
map features, mı and ma (the removal of m» and mz is trivial in this ex- 
ample). On both cases, the feature removal modifies the link between any 
pair of poses from which a feature was originally observed. As illustrated in 
Figure 11.3b, this operation may lead to the introduction of new links in the 
graph. In the example shown there, the removal of m3 leads to a new link 
between x2 and z4. 

Formally, let 7(j) be the set of poses at which m; was observed (that is: 
rz, € T(j) 4> Hi : cj = j). Then we already know that the feature m; 
is only linked to poses zt in 7(j); by construction, m; is not linked to any 
other pose, or to any feature in the map. We can now set all links between 
m; and the poses 7(7) to zero by introducing a new link between any two 
poses z,,zp € T(j). Similarly, the information vector values for all poses 
T(j) are also updated. An important characteristic of this operation is that 
it is local: It only involves a small number of constraints. After removing 





all links to mj, we can safely remove m; from the information matrix and 
vector. The resulting information matrix is smaller—it lacks an entry for m;. 
However, it is equivalent for the remaining variables, in the sense that the 
posterior defined by this information matrix is mathematically equivalent to 
the original posterior before removing m;. This equivalence is intuitive: We 
simply have replaced springs connecting m; to various poses in our spring 
mass model by a set of springs directly linking these poses. In doing so, 
the total force asserted by these springs remains equivalent, with the only 
exception that m; is now disconnected. 

The virtue of this reduction step is that we can gradually transform our 
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(a) The removal of mı changes the link between x1 and x2 
X, X4 X, x, m, m, m, m, 
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(c) Final Result after removing all map features 
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Figure 11.3 Reducing the graph in GraphSLAM: Arcs are removed to yield a net- 
work of links that only connect robot poses. 
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inference problem into a smaller one. By removing each feature m; from 
Q and £, we ultimately arrive at a much smaller information form Q and 
€ defined only over the robot path variables. The reduction can be carried 
out in time linear in the size of the map; in fact, it generalizes the variable 
elimination technique for matrix inversion to the information form, in which 
we also maintain an information state. The posterior over the robot path is 
now recovered as X = Q^! and ji = X£. Unfortunately, our reduction step 
does not eliminate cycles in the posterior. The remaining inference problem 
may still require more than linear time. 

As a last step, GraphSLAM recovers the feature locations. Conceptually, 
this is achieved by building a new information matrix €; and information 
vector £; for each mj. Both are defined over the variable m; and the poses 
T(j) at which m; were observed. It contains the original links between m; 
and 7(j), but the poses 7(j) are set to the values in ñ, without uncertainty. 
From this information form, it is now simple to calculate the location of m;, 
using the common matrix inversion trick. Clearly, Q; contains only elements 
that connect to m;; hence the inversion takes time linear in the number of 
poses in 7(;). 

It should be apparent why the graph representation is such a natural rep- 
resentation. The full SLAM problem is solved by locally adding information 
into a large information graph, one edge at-a-time for each measurement z; 
and each control u;. To turn such information into an estimate of the map 
and the robot path, it is first linearized, then information between poses and 
features is gradually shifted to information between pairs of poses. The re- 
sulting structure only constraints the robot poses, which are then calculated 
using matrix inversion. Once the poses are recovered, the feature locations 
are calculated one-after-another, based on the original feature-to-pose infor- 
mation. 


The GraphSLAM Algorithm 


We will now make the various computational steps of the GraphSLAM pre- 
cise. The full GraphSLAM algorithm will be described in a number of steps. 
The main difficulty in implementing the simple additive information algo- 
rithm pertains to the conversion of a conditional probability of the form 
p(zi | z+, m) and p(x: | ui, 24-1) into a link in the information matrix. The 
information matrix elements are all linear; hence this step involves lineariz- 
ing p(z] | zs, m) and p(z: | ui 24-1). In EKF SLAM, this linearization was 
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1: Algorithm GraphSLAM_initialize(u,,;): 
H0,x 0 
2: Hoy | = | 0 
10,0 0 
3: for all controls u, = (v; wz)" do 
Ht,x Ht—1,x 
4: Lt,y >= Ht—1,y 
Ht,0 Ht—1,0 
= sin 41,0 + ne sin(14—1,0 + 4 At) 
4: 十 A COS J41—1,8 一 DE cos(p—1,0 + wAt) 
QUAE 
5: endfor 
6: return Lo: 








Table 11.1 Initialization of the mean pose vector Ht in the GraphSLAM algorithm. 





1: 





Algorithm GraphSLAM _linearize(uy., 21:1, C1:t, 10:4): 


set —0,£ =0 
co 0 0 

add 0 oo 0 to Q at xo 
0 0 oo 


for all controls u, = (v; w)? do 
m sin 41,0 + FA sin(u:—1,9 + o AL) 





f= pe- H | cos fr-1,6 — n cos(Hr-1,9 + wr At) 
w,At 
1 0 ue COS 141—1,0 十 m cos(ut—1,9 + w,At) 
Gi=] 0 1 = sin Nt _1.6 + pa sin(1; 1,0 + w,At) 
0 0 1 





see next page for continuation | 
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18: 


19: 


20: 
2]: 
22: 








continued from the previous page 








-GT 


add R,! (-G, 1) toQ ata, and z1 


add p [24 = Gi Ia] to é at x4 and X1—1 


Cr) 
Cr) 


endfor 


for all measurements z, do 


c? 0 0 
Q=| 0 cà 0 
0 0 c 
for all observed features zt = (ri $i si) do 
j-d 
ó- ( bx ) = ( Mya ta ) 
by Hi Pty 
q=8Tő 
| va 
2 = | atan2(d,,d2) — Lio 
8j 
vv 0 TV yay 0 
Hj = by -ôs wg. +6 0 
0 0 0 0 0 q 
add Hi? Q;' Hi toQ atx, and m; 
Lt,x 
Ht,y 
add HiT Q7' [21 — 2i + Hj 2 ] to£ at x, and m; 
jm 
Hj, 
Hj,s 
endfor 
endfor 
return Q, € 





Table 11.2 Calculation of Q and € in GraphSLAM. 
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Algorithm GraphSLAM_reduce(Q, £): 


for each feature j do 
let t(j) be the set of all poses x; at which j was observed 
subtract €), j oc £j from Ê at z,(jj and mj 
subtract OQ. Qo D ose from Q atz,(j and m; 
remove from Q and € all rows/columns corresponding to j 


endfor 


return Q, E 








Table 11.3 Algorithm for reducing the size of the information representation of the 
posterior in GraphSLAM. 





p 





Algorithm GraphSLAM. solve(Q, EQ, &): 


Eos = Q7! 

Hot = Dot Ê 

for each feature j do 
set T(j) to the set of all poses zt at which j was observed 
pj = 055 (5j + 95) fie) 

endfor 


return u, 39.1 





Table11.4 Algorithm for updating the posterior ju. 


»js.on 000000 





350 


11 The GraphSLAM Algorithm 





1: Algorithm GraphSLAM known correspondence(u; ;, 21:1, C1:t): 


Lot = GraphSLAM  initialize(u;.) 

repeat 
Q, € = GraphSLAM linearize(u;.;, 21-1, C11, HO:t) 
0, € = GraphSLAM_reduce(Q, £) 
u, Xo. = GraphSLAM_solve(Q, €, 9, £) 


until convergence 


return u 











Table 11.5 The GraphSLAM algorithm for the full SLAM problem with known cor- 
respondence. 


found by calculating a Jacobian at the estimated mean poses 4194. To build 
our initial information matrix 2 and £, we need an initial estimate /Lo:t for all 
poses zo. 

There exist a number of solutions to the problem of finding an initial mean 
p. suitable for linearization. For example, we can run an EKF SLAM and use 
its estimate for linearization. For the sake of this chapter, we will use an even 
simpler technique: Our initial estimate will simply be provided by chaining 
together the motion model p(x, | ur, z, 1). Such an algorithm is outlined 
in Table 11.1, and called there GraphSLAM initialize. This algorithm takes 
the controls u1., as input, and outputs sequence of pose estimates po:+. It 
initializes the first pose by zero, and then calculates subsequent poses by re- 
cursively applying the velocity motion model. Since we are only interested in 
the mean poses vector /Lot GraphSLAM initialize only uses the determin- 
istic part of the motion model. It also does not consider any measurement in 
its estimation. 

Once an initial /lo is available, the GraphSLAM algorithm constructs the 
full SLAM information matrix Q and the corresponding information vector £. 
This is achieved by linearizing the links in the graph. The algorithm Graph- 
SLAM linearize is depicted in Table 112. This algorithm contains a good 
amount of mathematical notation, much of which shall become clear in our 
derivation of the algorithm further below. GraphSLAM_linearize accepts as 
an input the set of controls, u1:+, the measurements z1,; and associated corre- 
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spondence variables cl:t and the mean pose estimates /0:t. It then gradually 
constructs the information matrix €? and the information vector € through lin- 
earization, by locally adding submatrices in accordance with the information 
obtained from each measurement and each control. 

In particular, line 2 in GraphSLAM linearize initializes the information 
elements. The "infinite" information entry in line 3 fixes the initial pose zo 
to (0 0 0)7. It is necessary, since otherwise the resulting matrix becomes 
singular, reflecting the fact that from relative information alone we cannot 
recover absolute estimates. 

Controls are integrated in lines 4 through 9 of GraphSLAM linearize. The 
pose ĉ and the Jacobian G, calculated in lines 5 and 6 represent the linear 
approximation of the non-linear control function g. As obvious from these 
equations, this linearization step utilizes the pose estimates j9:; 1, with uo = 
(0 0 0)7. This leads to the updates for Q, and £, calculated in lines 7, and 8, 
respectively. Both terms are added into the corresponding rows and columns 
of Q and £. This addition realizes the inclusion of a new constraint into the 
SLAM posterior, very much along the lines of the intuitive description in the 
previous section. 

Measurements are integrated in lines 10 through 21 of Graph- 
SLAM linearize. The matrix Q; calculated in line 11 is the familiar measure- 
ment noise covariance. Lines 13 through 17 compute the Taylor expansion 
of the measurement function, here stated for the feature-based measurement 
model defined in Chapter 6.6. Attention has to be payed to the implemen- 
tation of line 16, since the angular expressions can be shifted arbitrarily by 
2r. This calculation culminates into the computation of the measurement 
update in lines 18 and 19. The matrix that is being added to 1 in line 18 is 
of dimension 6 x 6. To add it, we decompose it into a matrix of dimension 
3 x 3 for the pose zt a matrix of dimension 3 x 3 for the feature m;, and 
two matrices of dimension 3 x 3 and 3 x 3 for the link between 2; and mj. 
Those are added to Q at the corresponding rows and columns. Similarly, the 
vector added to the information vector £ is of vertical dimension 5. It is also 
chopped into two vectors of size 3 and 2, and added to the elements corre- 
sponding to x; and m,, respectively. The result of GraphSLAM linearize is 
an information vector € and a matrix Q. We already noted that Q is sparse. It 
contains only non-zero submatrices along the main diagonal, between subse- 
quent poses, and between poses and features in the map. The running time 
of this algorithm is linear in t, the number of time steps at which data was 
accrued. 

The next step of the GraphSLAM algorithm pertains to reducing the di- 
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mensionality of the information matrix/vector. This is achieved through the 
algorithm GraphSLAM reduce in Table 11.3. This algorithm takes as input 
Q and € defined over the full space of map features and poses, and outputs 
a reduced matrix € and vectors £ defined over the space of all poses (but not 
the map!). This transformation is achieved by removing features m; one-at- 
a-time, in lines 4 through 9 of GraphSLAM_reduce. The bookkeeping of the 
exact indexes of each item in €? and £ is a bit tedious, hence Table 11.3 only 
provides an intuitive account. 

Line 5 calculates the set of poses 7(j) at which the robot observed feature 
j. It then extracts two submatrices from the present Ñ: 0;,; and rj) j: Qj; 
is the quadratic submatrix between m; and m;, and Q Gig 19 reed of 
the off-diagonal elements between m; and the pose variables 7(j). It also 
extracts from the information state vector Ê the elements corresponding to 
the j-th feature, denoted here as £;. It then subtracts information from Q and 
€ as stated in lines 6 and 7. After this operation, the rows and columns for 
the feature m; are zero. These rows and columns are then removed, reduc- 
ing the dimension on Q and £ accordingly. This process is iterated until all 
features have been removed, and only pose variables remain in 9 and £. The 
complexity of GraphSLAM. reduce is once again linear in t. 

The last step in the GraphSLAM algorithm computes the mean and covari- 
ance for all poses in the robot path, and a mean location estimate for all fea- 
tures in the map. This is achieved through GraphSLAM. solve in Table 11.4. 
Lines 2 and 3 compute the path estimates /19.,, by inverting the reduced in- 
formation matrix € and multiplying the resulting covariance with the infor- 
mation vector. Subsequently, GraphSLAM. solve computes the location of 
each feature in lines 4 through 7. The return value of GraphSLAM. solve 
contains the mean for the robot path and all features in the map, but only 
the covariance for the robot path. We note that there exist other, more ef- 
ficient ways to compute Ho:t that bypass the matrix inversion step. Those 
will be discussed towards the end of this chapter, when applying standard 
optimization techniques to GraphSLAM. 

The quality of the solution calculated by the GraphSLAM algorithm de- 
pends on the goodness of the initial mean estimates, calculated by Graph- 
SLAM initialize. The x- and y- components of these estimates affect the 
respective models in a linear way, hence the linearization does not depend 
on these values. Not so for the orientation variables in 194. Errors in these 
initial estimates affect the accuracy of the Taylor approximation, which in 
turn affects the result. 

To reduce potential errors due to the Taylor approximation in the lin- 
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earization, the procedures GraphSLAM linearize, GraphSLAM reduce, 
and GraphSLAM solve are run multiple times over the same data set. Each 
iteration takes as an input an estimated mean vector /o: from the previous 
iteration, and outputs a new, improved estimate. The iteration of the Graph- 
SLAM optimization are only necessary when the initial pose estimates have 
high error (e.g., more than 20 degrees orientation error). A small number of 
iterations (e.g., 3) is usually sufficient. 

Table 11.5 summarizes the resulting algorithm. It initializes the means, 
then repeats the construction step, the reduction step, and the solution step. 
Typically, two or three iterations suffice for convergence. The resulting mean 
u is our best guess of the robot's path and the map. 


Mathematical Derivation of GraphSLAM 


The derivation of the GraphSLAM algorithm begins with a derivation of a 
recursive formula for calculating the full SLAM posterior, represented in in- 
formation form. We then investigate each term in this posterior, and derive 
from them the additive SLAM updates through Taylor expansions. From 
that, we will derive the necessary equations for recovering the path and the 
map. 


The Full SLAM Posterior 


Asin the discussion of EKF SLAM, it will be beneficial to introduce a variable 
for the augmented state of the full SLAM problem. We will use y to denote 
state variables that combine one or more poses x with the map m. In partic- 
ular, we define yo.; to be a vector composed of the path xo, and the map m, 
whereas y; is composed of the momentary pose at time t and the map m: 


zx 
Yot = : and y, = ( j) 
m 


The posterior in the full SLAM problem is p(yo: | 21:4, u1:4, 014), Where 21: 
are the familiar measurements with correspondences clty and v. are the 
controls. Bayes rule enables us to factor this posterior: 


P(Yo:t | Zt, U1:t; Crt) 
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XE n p(z | Yorts Z1:t—1; Ust; Crit) P(Yo:t | 21-1, Unit, Cit) 


where 1 is the familiar normalizer. The first probability on the right-hand 
side can be reduced by dropping irrelevant conditioning variables: 


plz | Yo:ts Z1:t-1, U1:t, C1:t) Z pz | Yt, Ct) 


Similarly, we can factor the second probability by partitioning yo:; into z; and 
Yo:t—1, and obtain 


P(Yo:t | Z21:t-15 U1:t; C1:t) 


p(z | Yo:t—1; Ž1:t—1; U1:t, C1:t) P(Yo:t—1 | Zi4—1, Ut; Crit) 





= p(xt | £t—1, Ut) P(Yo:t 1 | zu 1, U1:t—1, C1:t 1) 


Putting these expressions back into (11.5) gives us the recursive definition of 
the full SLAM posterior: 


P(Yo:t | Zi, Ust; Crit) 


— n p(z | Yt, Ce) D(xt | Tt—1, Ut) p (yos 1 | 21:t—-1, U1:t-1, C1:t 1) 





The closed form expression is obtained through induction over t. Here p(yo) 
is the prior over the map m and the initial pose zo. 


P(Yo:t | Zi, Ut; Crit) = n p(yo) LH» | Xt-1, Ut) plz | Yt, Ct) 
t 


II 


n plyo) [J (pæl xiu). [] (zi | ve 


t 


Here, as before, 2? is the i-th measurement in the measurement vector z, at 
time t. The prior p(yo) factors into two independent priors, p(zo) and p(m). 
In SLAM, we usually have no prior knowledge about the map m. We simply 
replace p(yo) by p(zo) and subsume the factor p(m) into the normalizer n. 


The Negative Log Posterior 


The information form represents probabilities in logarithmic form. The log- 
SLAM posterior follows directly from the previous equation: 


log p(yo: | zie; uiis cia) 


= const. +logp(xo) + 5 log p(z« | 1-1, ut) + >》 log p(z; | ye ct) 


t i 
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Just as in Chapter 10, we assume the outcome of robot motion is distributed 
normally according to A (g(u;, x;..1), Rt), where g is the deterministic motion 
function, and R; is the covariance of the motion error. Similarly, measure- 
ments zj are generated according to V (h(yi, ci), Q+), where h is the familiar 
measurement function and Q; is the measurement error covariance. In equa- 
tions, we have 


p(xe|zii,w) = n exp(- (zc — glut, zii)! Rz. (we — glu, zia))] 
P(% | Yes cb) n exp {-5 (24 — hun e)" Qr? Gi — h(ee))] 
The prior p(zo) in (11.10) is also easily expressed by a Gaussian-type distri- 


bution. It anchors the initial pose xo to the origin of the global coordinate 
system: xo = (0 0 0)7: 


D(Zo) = qm exp { 一 亏 zl Qo xo} 
with 

co 0 0 
Qo = 0 CO 0 

0 0 oo 


For now, it shall not concern us that the value of oo cannot be implemented, 
as we can easily substitute oo with a large positive number. This leads to the 
following quadratic form of the negative log-SLAM posterior in (11.10): 


=, log p(yo:t | Z1:t, Ult; Cit) 


= const. +4 |zj Qo zo + So (xt — glut, 24 1))T. Rp! (xe — glut, 21 1)) 
t 


p» zi — hlyn d))7 Qt (zi — hlyn, di) 


This is "—— the same as JarapnsrAM in Equation (11.3), with a few dif- 
ferences pertaining to the omission of normalization constants (including a 
multiplication with —1). Equation (11.15) highlights an essential character- 
istic of the full SLAM posterior in the information form: It is composed of a 
number of quadratic terms, one for the prior, and one for each control and 
each measurement. 


Taylor Expansion 


The various terms Equation (11.15) are quadratic in the functions g and h, not 
in the variables we seek to estimate (poses and the map). GraphSLAM alle- 
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LINEARIZATION viates this problem by linearizing g and h via Taylor expansion—completely 
analogously to Equations (10.14) and (10.18) in the derivation of the EKF. In 
particular, we have: 


(11.16) g(we,te-1) &  g(uc pii) + Ge(@e-1 — i1) 

(11.17) hyc) f£ hle, ct) + Ht (ye — pe) 
Here u is the current estimate of the state vector y, and Hj = hi Fy j as 
defined already in Equation (10.19). 


This linear approximation turns the log-likelihood (11.15) into a function 
that is quadratic in yo... In particular, we obtain 


(11.18) — logp(yos| zuo 14,014). = const. — i 


{38 Qo xo + Som — gue, i1) — Gira 一 AD 
t 
Ry’ [we — glut, 11) — Gira — -1 )] 
tla — h(j, ci) - Hi (ye — no]. Qr [6 — hus.) — Hi — roi} 


This function is indeed a quadratic in yo, and it shall prove convenient to 
reorder its terms, omitting several constant terms. 


(11.19) log p(yo:t | 21:2, Wit, C14) = const. 


T 
1 T 1 T 一 Gi —1 
—5 Zo Qo Xo -4% Tt—i:t ( 1 ) Ry (Gi 1) tiit 
—— 
quadratic in xo ———————— 


quadratic in v4~1:4 
T "UE ug 
TY, qa 1 Ry [g(ue, pai) — Ge i-i] 


linear in z;— 1: 


a u HiT Qj Hiw + ye Hp QU [ei - hus d) + Hin] 
一 一 
quadratic in y: linear in y; 


Here x;_1., denotes the state vector concatenating z;..; and z+; hence we can 
; —GT 
write (a, — Gi 2Zt_1) 工 eL lu (一 G, 1)7 Eus ( 2. ) 
If we collect all quadratic terms into the matrix Q, and all linear terms into 
a vector £, we see that expression (11.19) is of the form 


(11.20) log p(yo:t | zi, u14,C14) = const. — 5 i YO: t Q yos 十 Yo: eG 


5rjs.cn 000000 


11.4.4 


(11.21) 


(11.22) 


(11.23) 


(11.24) 
(11.25) 


11.4 Mathematical Derivation of GraphSLAM 357 


Constructing the Information Form 


We can read off these terms directly from (11.19), and verify that they are 
indeed implemented in the algorithm GraphSLAM_linearize in Table 11.2: 


* Prior. The initial pose prior manifests itself by a quadratic term Qo over 
the initial pose variable zo in the information matrix. Assuming appro- 
priate extension of the matrix Qo to match the dimension of yo.;, we have 


Q — Qo 


This initialization is performed in lines 2 and 3 of the algorithm Graph- 
SLAM_linearize. 


e Controls. From (11.19), we see that each control u; adds to Q and £ the 
following terms, assuming that the matrices are rearranged so as to be of 
matching dimensions: 


cor p 
9 — +f 7 JR'CG 1) 


(xr 
bt es ( ri Je [g (ut, i-i) — Gt pii] 


This is realized in lines 4 through 9 in GraphSLAM_linearize. 


e Measurements. According to Equation (11.19), each measurement 2} 
transforms Q and € by adding the following terms, once again assuming 
appropriate adjustment of the matrix dimensions: 


Q — mp H? 
E — E+ Hi Qi [å — hlm, ci) + Hiu] 


This update occurs in lines 10 through 21 in GraphSLAM_linearize. 


This proves the correctness of the construction algorithm Graph- 
SLAM linearize, relative to our Taylor expansion approximation. 

We also note that the steps above only affect off-diagonal elements that 
involve at least one pose. Thus, all between-feature elements are zero in the 
resulting information matrix. 
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Marginals of a multivariate Gaussian. Let the probability distribution 
p(x, y) over the random vectors x and y be a Gaussian represented in the 
information form 


Oyo Nyy £y 
If Qy, is invertible, the marginal p(x) is a Gaussian whose information rep- 
resentation is 


Dg = Ner ue MO and t] EO OS B 


Proof. The marginal for a Gaussian in its moments parameterization 


X M x 
Dyr Xyy Hy 
is N (Hs, Uexr). By definition, the information matrix of this Gaussian is 


therefore 2z and the information vector is £z} Hs. We show Uz} = 0,, 
via the Inversion Lemma from Table 3.2 on page 50. Let P = (0 1)7, and 
let [oo] be a matrix of the same size as (wy but whose entries are all infinite 
(with [oo]! = 0). This gives us 


ANIM C 


The same expression can also be expanded by the inversion lemma into: 


(Q + P | s 


Q P([o] ! + PT Q P) ! P" Q 
= ee PTQ 
Q P(Qy 
(o a) 
0 qs 


dr aed 
Q Q Q 
Q Q Q 
Qye Qyy 0 1 Qye Qyy 
Q Q 
Q Q 


1 


see next page for continuation 
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continued from the previous page 








The remaining statement, Nee £,, is obtained analogously, exploiting 
the fact that u = Q^ !£ (see Equation(3.73)) and the equality of the two 
expressions marked “(x)” above: 


[ow wq see Tu 
£u hc 2 
E cure 


Table 11.6 Lemma for marginalizing Gaussians in information form. The form of 
the covariance (2,4 in this lemma is also known as Schur complement. 


~ 
* 














Conditionals of a multivariate Gaussian. Let the probability distribution 
p(x, y) over the random vectors x and y be a Gaussian represented in the 
information form 


Q = d d 一 
fo 


The conditional p(x | y) is a Gaussian with information matrix Q,, and 
information vector £; — (15, y. 


Proof. The result follows trivially from the definition of a Gaussian in in- 
formation form: 


p(x | y) 
T T 
= exp 4 —5 + 
: : ( y ) ( Oy; Qyy y y Ey 
= 7 exp {- 4r Our 一 x Qeyy 一 Sy! Qyyy + at Ex + y" Ey} 


= mexp(-izTQ,,r +a" (Es — Qeyy) -3y Qyyy + y Ey} 
———— 


const. 











zog exp(- iz] Oy, “le a (£; E O,,y)) 





Table 11.7 Lemma for conditioning Gaussians in information form. 
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Reducing the Information Form 


The reduction step GraphSLAM_reduce is based on a factorization of the 
full SLAM posterior. 
P(Yo:t | Zus; uias Crt) = pto | zuc was Crt) PON | Lore, 21-4, Ure, 014) 


Here p(xoa | 21:4, U1, C14) ~ M(E, Q) is the posterior over paths alone, with 
the map integrated out: 


P(Lo:t | 23:4, 01:1, C1:t) ms Jos | 23:45 01:5 C1:t) dm 


As we will show shortly, this probability is indeed calculated by the algo- 
rithm GraphSLAM reduce in Table 11.3, since 


p(zo | Zi, U1st; Crt) ™ N(E,Q) 


In general, the integration in (11.27) will be intractable, due to the large num- 
ber of variables in m. For Gaussians, this integral can be calculated in closed 
form. The key insight is given in Table 11.6, which states and proves the 
marginalization lemma for Gaussians. 

Let us subdivide the matrix 2 and the vector € into submatrices, for the 
robot path 29., and the map m: 


Q — ( E Qrim ) 


ous no Omm 


— ( Sx ) 
Rex 
According to the marginalization lemma, the probability (11.28) is obtained as 


m, —1 
Q = Or o.40:¢ ES zoom Om Om, zo. 


E = ne = (oim Lp sean Em 


The matrix Q is block-diagonal. This follows from the way €? is con- 
structed, in particular the absence of any links between pairs of features. 
This makes the inversion efficient: 


—1 T (-1 ; 
Qm DE 2. F} 05 Fj 
J 


where Q;; = FjQFT is the sub-matrix of © that corresponds to the j-th 
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feature in the map: 


0---0 100 0---0 
0---0 010 0---0 
0---0 001 0---0 
—-—— 
j—th feature 
This insight makes it possible to decompose the implement Equations (11.31) 
and (11.32) into a sequential update: 


C —1 
Q Qro: Tot 一 X Oso j 05 Quo, 


J 
€ = xo. — Se ay 057 Ej 
j 


The matrix 2,,.,,; is non-zero only for elements in 7(7), the set of poses at 
which feature j was observed. This essentially proves the correctness of the 
reduction algorithm GraphSLAM reduce in Table 11.3. The operation per- 
formed on €? in this algorithm can be thought of as the variable elimination 
algorithm for matrix inversion, applied to the feature variables but not the 
robot pose variables. 


Recovering the Path and the Map 


The algorithm GraphSLAM solve in Table 11.4 calculates the mean and 
variance of the Gaussian AV (£, Q), using the standard equations, see Equa- 
tions (3.72) and (3.73) on page 72: 


z = 
à = dE 
In particular, this operation provides us with the mean of the posterior on 


the robot path; it does not give us the locations of the features in the map. 
It remains to recover the second factor of Equation (11.26): 


—1 


eoi 


p(m | zo; ze; ui; Crt) 
The conditioning lemma, stated and proved in Table 11.7, shows that this prob- 


ability distribution is Gaussian with the parameters 


Ym = a! 


m,m 


Xn (Em + Omis é) 


Lm 
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Here ém and Qm m are the subvector of €, and the submatrix of Q, respec- 
tively, restricted to the map variables. The matrix Qm,2,., is the off-diagonal 
submatrix of Q that connects the robot path to the map. As noted before, 
(15.4, is block-diagonal, hence we can decompose 


p(m | LO:ty Zt; Urt; Cut) 二 lI»; | LO:ty Z1:t, U1:t; C1:t) 
j 


where each p(m; | Zo 21:2, U1:t; C14) is distributed according to 
Xp Exc 
pj = Xl t Njel) = NE; c0, 
The last transformation exploited the fact that the submatrix €); zo., is zero ex- 
cept for those pose variables 7(7) from which the j-th feature was observed. 
It is important to notice that this is a Gaussian p(m | zo, zi4, Uist, 014) 
conditioned on the true path xo... In practice, we do not know the path, 
hence one might want to know the posterior p(m | zj4,u14,c14) with- 
out the path in the conditioning set. This Gaussian cannot be factored in 
the moments parameterization, as locations of different features are corre- 
lated through the uncertainty over the robot pose. For this reason, Graph- 
SLAM. solve returns the mean estimate of the posterior but only the covari- 
ance over the robot path. Luckily, we never need the full Gaussian in mo- 
ments representation—which would involve a fully populated covariance 
matrix of massive dimensions—as all essential questions pertaining to the 
SLAM problem can be answered at least in approximation without know- 
ledge of X. 


Data Association in GraphSLAM 


Data association in GraphSLAM is realized through correspondence variables, 
just as in EKF SLAM. GraphSLAM searches for a single best correspondence 
vector, instead of calculating an entire distribution over correspondences. 
Thus, finding a correspondence vector is a search problem. However, it shall 
prove convenient to define correspondences slightly differently in Graph- 
SLAM than before: correspondences are defined over pairs of features in the 
map, rather than associations of measurements to features. Specifically, we 
say c(j, k) = 1 if m; and m; correspond to the same physical feature in the 
world. Otherwise, c(j, k) — 0. This feature-correspondence is in fact logi- 
cally equivalent to the correspondence defined in the previous section, but it 
simplifies the statement of the basic algorithm. 
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The technique for searching the space of correspondences is greedy, just 
as in the EKF. Each step in the search of the best correspondence value leads 
to an improvement, as measured by the appropriate log-likelihood function. 
However, because GraphSLAM has access to all data at the same time, it 
is possible to devise correspondence techniques that are considerably more 
powerful than the incremental approach in the EKF. In particular: 


1. At any point in the search, GraphSLAM can consider the correspondence 
of any set of features. There is no requirement to process the observed 
features sequentially. 


2. Correspondence search can be combined with the calculation of the map. 
Assuming that two observed features correspond to the same physical 
feature in the world affects the resulting map. By incorporating such a 
correspondence hypothesis into the map, other correspondence hypothe- 
ses will subsequently look more or less likely. 


3. Data association decisions in GraphSLAM can be undone. The goodness 
of a data association depends on the value of other data association de- 
cisions. What appears to be a good choice early on in the search may, at 
some later time in the search, turn out to be inferior. To accommodate such 
a situation, GraphSLAM can effectively undo a previous data association 
decision. 


We will now describe one specific correspondence search algorithm that ex- 
ploits the first two properties, but not the third. The data association algo- 
rithm will still be greedy, and it will sequentially search the space of possi- 
ble correspondences to arrive at a plausible map. However, like all greedy 
algorithms, our approach is subject to local maxima; the true space of cor- 
respondences is of course exponential in the number of features in the map. 
Nevertheless, we will be content with a hill climbing algorithm and postpone 
the treatment of an exhaustive algorithm to the next chapter. 


The GraphSLAM Algorithm with Unknown Correspondence 


The key component of our algorithm is a likelihood test for correspondence. 
Specifically, GraphSLAM correspondence is based on a simple test: What 
is the probability that two different features in the map, m; and mx, corre- 
spond to the same physical feature in the world? If this probability exceeds 
a threshold, we will accept this hypothesis and merge both features in the 
map. 
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1: Algorithm GraphSLAM_correspondence_test(Q, £, u, 20:t j, k): 
Qua — Bjkjk — Qjk,r Gk) Ur Gk) TGk) QrG,k) jk 
3: Sy] = Qa) Hk 


T 
4: OA; k = : Q I 
i Apk a DE 1 
T 
1 
5: ÉAjk = | = SL] 


z fu, —1 
6: BAjk = QA5 CAL 


1 — 
7: return |27 QAI 2 exp {-3 UR Sk AE Haje} 








Table 11.8 The GraphSLAM test for correspondence: It accepts as input an infor- 
mation representation of the SLAM posterior, along with the result of the Graph- 
SLAM. solve step. It then outputs the posterior probability that m; corresponds to 
Mk. 


The algorithm for the correspondence test is depicted in Table 11.8: The 
input to the test are two feature indexes, j and k, for which we seek to com- 
pute the probability that those two features correspond to the same feature 
in the physical world. To calculate this probability, our algorithm utilizes a 
number of quantities: The information representation of the SLAM posterior, 
as manifest by €? and £, and the result of the procedure GraphSLAM solve, 
which is the mean vector u and the path covariance Xo. 

The correspondence test then proceeds in the following way: First, it com- 
putes the marginalized posterior over the two target features. This posterior 
is represented by the information matrix Qy; x and vector £i; ;;j computed in 
lines 2 and 3 in Table 11.8. This step of the computation utilizes various sub- 
elements of the information form Q, £, the mean feature locations as specified 
through yi, and the path covariance Xo:;. Next, it calculates the parameters of 
a new Gaussian random variable, whose value is the difference between m; 
and mą. Denoting the difference variable A;; = m; — mx, the information 
parameters (24; 5, £A; are calculated in lines 4 and 5, and the correspond- 
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1: Algorithm GraphSLAM(u1:+, 21:1): 


initialize all cj with a unique value 

Hot = GraphSLAM  initialize(u.;) 

Q, € = GraphSLAM linearize(u.;, 21-4, C14, 10:4) 
0, € = GraphSLAM. reduce(Q, £) 

u, So = GraphSLAM solve(CQ, €, ©, £) 

repeat 


for each pair of non-corresponding features mj, mj, do 


Tj=k = GraphSLAM correspondence test 
(Q, £, Hs M0: 3; k) 


10: if tj;=~ > x then 

11: for all c = k set ci = j 

12: Q, E = GraphSLAM linearize(u;.;, 21:1, C10, Lot) 
13: 0, € = GraphSLAM. reduce(Q, £) 

14: u, Xo. = GraphSLAM solve(Q,£, 2, £) 

15: endif 

16: endfor 

17: until no more pair mj, mj, found with Tj=ẹ < x 

18: return u 








Table 11.9 The GraphSLAM algorithm for the full SLAM problem with unknown 
correspondence. The inner loop of this algorithm can be made more efficient by selec- 
tive probing feature pairs mj, mg, and by collecting multiple correspondences before 
solving the resulting collapsed set of equations. 


ing expectation for the difference is computed in line 6. Line 7 returns the 
probability that the difference between m; and mj, is zero. 

The correspondence test provides us with an algorithm for performing 
data association search in GraphSLAM. Table 11.9 shows such an algorithm. 
It initializes the correspondence variables with unique values. The four steps 
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that follow (lines 3-7) are the same as in our GraphSLAM algorithm with 
known correspondence, stated in Table 11.5. However, this general SLAM 
algorithm then engages in the data association search. Specifically, for each 
pair of different features in the map, it calculates the probability of corre- 
spondence (line 9 in Table 11.9). If this probability exceeds a threshold x, the 
correspondence vectors are set to the same value (line 11). 

The GraphSLAM algorithm iterates the construction, reduction, and solu- 
tion of the SLAM posterior (lines 12 through 14). As a result, subsequent 
correspondence tests factor in previous correspondence decisions though a 
newly constructed map. The map construction is terminated when no fur- 
ther features are found in its inner loop. 

Clearly, the algorithm GraphSLAM is not particularly efficient. In particu- 
lar, it tests all feature pairs for correspondence, not just nearby ones. Further, 
it reconstructs the map whenever a single correspondence is found; rather 
than processing sets of corresponding features in batch. Such modifications, 
however, are relatively straightforward. A good implementation of Graph- 
SLAM will be more refined than our basic implementation discussed here. 


Mathematical Derivation of the Correspondence Test 


We essentially restrict our derivation to showing the correctness of the cor- 
respondence test in Table 11.8. Our first goal shall be to define a posterior 
probability distribution over a variable Ajk = m; — mx, the difference be- 
tween the location of feature m; and feature mp. Two features m; and mk 
are equivalent if and only if their location is the same. Hence, by calculat- 
ing the posterior probability of A; xy we obtain the desired correspondence 
probability. 

We obtain the posterior for A; ;, by first calculating the joint over m; and 
Mk: 


p(m;, Mk | Zi Wits Cit) 
= ] rmi m | rt, Zit, C14) D(is | Zit, Ure, Crt) dta 


We will denote the information form of this marginal posterior by £r; ;; and 
Q5.,j. Note the use of the squared brackets, which distinguish these values 
from the submatrices of the joint information form. 

The distribution (11.45) is obtained from the joint posterior over yo:;, by 
applying the marginalization lemma. Specifically, Q and € represent the joint 
posterior over the full state vector yo.; in information form, and 7(j) and 7(k) 


»js.on 000000 


(11.46) 


(11.47) 


(11.48) 


(11.49) 


(11.50) 


11.5 Data Association in GraphSLAM 367 


denote the sets of poses at which the robot observed feature j, and feature 
k, respectively. GraphSLAM gives us the mean pose vector ñ. To apply the 
marginalization lemma (Table 11.6), we shall leverage the result of the al- 
gorithm GraphSLAM solve. Specifically, GraphSLAM. solve provides us 
already with a mean for the features m; and mp. We simply restate the com- 
putation here for the joint feature pair: 


Hj] = Oege (im + oe Ga) Hr.) 
Here 7(j, k) = T(j)UT(k) denotes the set of poses at which the robot observed 
Mj Or Mk. 

For the joint posterior, we also need a covariance. This covariance is not 
computed in GraphSLAM. solve, simply because the joint covariance over 
multiple features requires space quadratic in the number of features. How- 
ever, for pairs of features the covariance of the joint is easily recovered. 

Let © (;,%),7(¢j,4) be the submatrix of the covariance 5o, restricted to all 
poses in 7(j, k). Here the covariance Xo. is calculated in line 2 of the al- 
gorithm GraphSLAM. solve. Then the marginalization lemma provides us 
with the marginal information matrix for the posterior over (m; mg)”: 


Opa] —  Qjk — Qjkr Gk) UrG.k).7G.k) QrG kk 

The information form representation for the desired posterior is now com- 
pleted by the following information vector: 

Sp = Oye) Mp] 


Hence we have for the joint 


p(m;, Mk | Z1:t; U1:t, C1:t) 
T 


T 
asp) f OL) ta) 


These equations are identical to lines 2 and 3 in Table 11.8. 

The nice thing about our representation is that it immediately lets us define 
the desired correspondence probability. For that, let us consider the random 
variable 


Aj k = Mi — Mk 
E 1 \" (om; 
B —1 Mk 

T 
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Plugging this into the definition of a Gaussian in information representation, 
we obtain: 
pA | Zi, U1:t; Ct) 


js 1 p 
= vol — in ( = ) Op x] ( E )s can ( 1 ) Ezk] | 
一 一 — 


=: QAj,k =: €Aj,k 


T 
= nexp{—3 AŞk OAj + AT, CAjk] 


which is Gaussian with information matrix Qaj, and information vector 
£Aj as defined above. To calculate the probability that this Gaussian as- 
sumes the value of Aj, = 0, it shall be useful to rewrite this Gaussian in 
moments parameterization: 


P(Aj | Z1:t, uii; Crt) 

= 2m TORRE exp {-3 (Ajk — Mage)” QA (Ajk — hase) 
where the mean is given by the obvious expression: 
HAj,k = ORF E Aj,k 


These steps are found in lines 4 through 6 in Table 11.8. 
The desired probability for A; = 0 is the result of plugging 0 into this 
distribution, and reading off the resulting probability: 


1 = 
pA; x —Ü0lzuíounna) = 2x Qu 2 exp [-3 UA TR pase} 


This expression is the probability that two features in the map, m; and m;, 
correspond to the same features in the map. This calculation is implemented 
in line 7 in Table 11.8. 


Efficiency Consideration 


Practical implementations of GraphSLAM rely on a number of additional 
insights and techniques for improving efficiency. Possibly the biggest defi- 
ciency of GraphSLAM, as discussed thus far, is due to the fact that in the very 
beginning, we assume that all observed features constitute different features. 
Our algorithm unifies them one-by-one. For any reasonable number of fea- 
tures, such an approach will be unbearably slow. Further, it will neglect the 
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important constraint that at any point in time, the same feature can only be 
observed once, but not twice. 

Existing implementations of the GraphSLAM idea exploit such opportu- 
nities. 

Features that are immediately identified to correspond with high likeli- 
hood are often unified form the very beginning, before running the full 
GraphSLAM solution. For example, it is quite common to compile short 
segments into local submaps, e.g., local occupancy grid maps. GraphSLAM 
inference is then performed only between those local occupancy grid maps, 
where the match of two maps is taken as a probabilistic constraint between 
the relative poses of these maps. Such a hierarchical technique reduces the 
complexity of SLAM by orders of magnitude, while still retaining some of 
the key elements of GraphSLAM, specifically the ability to perform data as- 
sociation over large data sets. 

Many robots are equipped with sensors that observe large number of fea- 
tures at a time. For example, laser range finders observe dozens of features 
within a single scan. For any such scan, one commonly assumes that dif- 
ferent measurements indeed correspond to different features in the environ- 
ment, by virtue of the fact that each scan points in a different direction. This 
is known as the mutual exclusion principle, which was already discussed in 
Chapter 7. It therefore follows that i Z j — ci # ci . No two measurements 
acquired within a single scan correspond to the same feature in the world. 

Our pairwise data association technique above is unable to incorporate 
this constraint. Specifically, it may assign two measurements z/ and zy to the 
same feature z^ for some s 7 t. To overcome this problem, it is common 
to associate entire measurement vectors z and z, at the same time. This in- 
volves a calculation of a joint over all features in z; and zs. Such a calculation 
generalizes our pairwise calculation and is mathematically straightforward. 

The GraphSLAM algorithm stated in this chapter does not make use of its 
ability to undo a data association. Once a data association decision is made, 
it cannot be reverted further down in the search. Mathematically, it is rela- 
tively straightforward to undo past data association decisions in the infor- 
mation framework. One can change the correspondence variables of any 
two measurements in arbitrary ways in our algorithm above. However, it is 
more difficult to test whether a data association should be undone, as there 
is no (obvious) test for testing whether two previously associated features 
should be distinct. A simple implementation involves undoing a data as- 
sociation in question, rebuilding the map, and testing whether our criterion 
above still calls for correspondence. Such an approach can be computation- 
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Figure 11.4 The Groundhog robot is a 1,500 pound custom-built vehicle equipped 
with onboard computing, laser range sensing, gas and sinkage sensors, and video 
recording equipment. The robot has been built to map abandoned mines. 


ally involved, as it provides no means of detecting which data association to 
test. Mechanisms for detecting unlikely associations are outside the scope of 
this book, but should be considered when implementing this approach. 

Finally, the GraphSLAM algorithm does not consider negative information. 
In practice, not seeing a feature can be as informative as seeing one. How- 
ever, our simple formulation does not perform the necessary geometric com- 
putations. 

In practice, whether or not we can exploit negative information depends 
on the nature of the sensor model, and the model of our features in the world. 
For example, we might have to compute probabilities of occlusion, which 
might be tricky for certain type sensors (e.g., range and bearing sensors for 
landmarks). However, contemporary implementations indeed consider neg- 
ative information, but often by replacing proper probabilistic calculations 
through approximations. One such example will be given in the next section. 


Empirical Implementation 


We will now highlight empirical results for a GraphSLAM implementation. 
The vehicle used in our experiment is Figure 11.4; it is a robot designed to 
map abandoned mines. 

The type map collected by the robot is shown in Figure 11.5. This map is an 
occupancy grid map, using effectively pairwise scan matching for recovering 
the robot’s poses. Pairwise scan matching can be thought of as a version of 
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Figure 11.5 Map of a mine, acquired by pairwise scan matching. The diameter of 
this environment is approximately 250 meters. The map is obviously inconsistent, 
in that several hallways show up more than once. Image courtesy of Dirk Hähnel, 
University of Freiburg. 
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Figure 11.6 Mine map skeleton, visualizing the local maps. 


GraphSLAM, but correspondence is only established between immediately 
consecutive scans. The result of this approach leads to an obvious deficiency 
of the map shown in Figure 11.5. 

To apply the GraphSLAM algorithm, our software decomposes the map 
into small local submaps, one for each five meters of robot travel. Within 
these five meters, the maps are sufficiently accurate, as general drift is small 
and hence scan matching performs essentially flawlessly. Each submap's co- 
ordinates become a pose node in the GraphSLAM. Adjacent submaps are 
linked through the relative motion constraints between them. The resulting 
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Figure 11.7 Data association search. See text. 
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Figure 11.8 Final map, after optimizing for data associations. Image courtesy of 
Dirk Hahnel, University of Freiburg. 
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Figure 11.9 Mine map generated by the Atlas SLAM algorithm by Bosse et al. (2004). 
Image courtesy of Michael Bosse, Paul Newman, John Leonard, and Seth Teller, MIT. 


structure is shown in Figure 11.6. 

Next, we apply the recursive data association search. The correspondence 
test is now implemented using a correlation analysis for two overlaying 
maps, and the Gaussian matching constraints are recovered by approximat- 
ing this match function through a Gaussian. Figure 11.7 illustrates the pro- 
cess of data association: The circles each correspond to a new constraint that 
is imposed when constructing the information form with GraphSLAM. This 
figure illustrates the iterative nature of the search: Certain correspondences 
are only discovered when others have been propagated, and others are dis- 
solved in the process of the search. The final model is stable, in that addi- 
tional search for new data association induces no further changes. Displayed 
as a grid map, it yields the 2-D map shown in Figure 11.8. While this map 
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is far from being perfect—largely due to a crude implementation of the lo- 
cal map matching constraints—it nevertheless is superior to the one found 
through incremental scan matching. 

The reader should notice that other information-theoretic techniques for 
SLAM have produced similar results. Figure 11.9 shows a map of the same 
data set generated by Bosse et al. (2004), using an algorithm called Atlas. 
This algorithm decomposes maps into submaps whose relation is maintained 
through information-theoretic relative links. See the bibliographical remarks 
for more detail. 


Alternative Optimization Techniques 


The reader may recall that the central target function .JGraphsLAM in Graph- 
SLAM is the nonlinear quadratic function in Equation (11.3). GraphSLAM 
minimizes this function through a sequence of linearizations, variable elim- 
inations, and optimizations. The inference technique in GraphSLAM solve 
in Table 11.4 is generally not very efficient. If all one is interested in is a map 
and a path without covariances, the calculation of the inverse in line 2 in 
Table 11.4 can be avoided. The resulting implementation will be computa- 
tionally much more efficient. 

The key to efficient inference lies in the form of the function JarapnsLAM- 
This function is of a general least squares form, and hence can be minimized 
by a number of different algorithms in the literature. Examples include gra- 
dient descent techniques, Levenberg Marquardt, and conjugate gradient. 

Figure 11.10a shows results obtained using conjugate gradient for minimiz- 
ing JarapnsnLAM. The data is a map of an outdoor environment of the ap- 
proximate size 600 by 800 meters, collected on Stanford's campus with the 
robot shown in Figure 11.10b. Figure 11.11 illustrates the alignment process, 
from a data set that is based on pose data only, to a fully aligned map and 
robot path. This dataset contains approximately 10? features and 10? poses. 
Running an EKF would be infeasible on such a large data set. As would be 
inverting the matrix 2 in Table 11.4. Conjugate gradient required only a few 
seconds to minimize JarapnsrAM. For this reason, many contemporary im- 
plementations of this approach use modern optimization techniques, instead 
of the relatively slow algorithm discussed here. The interested reader shall 
be referred to the bibliographical remarks for more pointers to alternative 
optimization techniques. 
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Figure 11.10 (a) A 3-D map of Stanford's campus. (b) The robot used for acquir- 
ing this data is based on a Segway RMP platform, whose development was funded 
by the DARPA MARS program. Image courtesy of Michael Montemerlo, Stanford 


University. (Turn this page 90 degrees to the right to view this figure.) 
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Figure 11.11 2-D slice through the Stanford campus map (a) before and (b) after 
alignment using conjugate gradient. Such an optimization takes only a few seconds 
with the conjugate gradient method applied to the least square formulation of Graph- 
SLAM. Images courtesy of Michael Montemerlo, Stanford University. 
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Summary 


This chapter introduced the GraphSLAM algorithm to the full SLAM prob- 
lem. 


GraphSLAM algorithm addresses the full SLAM problem. It calculates 
posteriors over the full robot path along with the map. Therefore, Graph- 
SLAM is a batch algorithm, not an online algorithm like EKF SLAM. 


GraphSLAM constructs a graph of soft constraints from the data set. In 
particular, measurements are mapped into edges that represent nonlinear 
constraints between poses and sensed features, and motion commands 
map into soft constraints between consecutive poses. The graph is nat- 
urally sparse. The number of edges is a linear function of the number 
of nodes, and each node is only connected to finitely many other nodes, 
regardless of the size of the graph. 


GraphSLAM simply records all this information in the graph, through 
links that are defined between poses and features, and pairs of subse- 
quent poses. However, this information representation does not provide 
estimates of the map or of the robot path. 


The sum of all such constraints is given by a function .GraphsLAM. The 
maximum likelihood estimates for the robot path and the map can be ob- 
tained by minimizing this function Ja:apnsr AM: 


GraphSLAM performs inference by mapping the graph into an isomor- 
phic information matrix and information vector, defined over all pose 
variables and the entire map. The key insight of the GraphSLAM algo- 
rithm is that the structure of information is sparse. Measurements pro- 
vide information of a feature relative to the robot's pose at the time of 
measurement. In information space, they form constraints between these 
pairs of variables. Similarly, motion provides information between two 
subsequent poses. In information space, each motion command forms a 
constraint between subsequent pose variables. This sparseness is inher- 
ited from the sparse graph. 


The vanilla GraphSLAM algorithm recovers maps through an iterative 
procedure that involves three steps: Construction of a linear information 
form through Taylor expansion, reduction of this form to remove the map, 
and solving the resulting optimization problem over robot poses. These 
three steps effectively resolve the information, and produce a consistent 
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probabilistic posterior over the path and the map. Since GraphSLAM is 
run batch, we can repeat the linearization step to improve the result. 


* Alternative implementations perform inference through nonlinear least 
squares optimization of the function .JGraphSLAM. However, such tech- 
niques only find the mode of the posterior, not its covariance. 


e Data association in GraphSLAM is performed by calculating the probabil- 
ity that two features have identical world coordinates. Since GraphSLAM 
is a batch algorithm, this can be done for any pair of features, at any time. 
This led to an iterative greedy search algorithm over all data association 
variables, which recursively identifies pairs of features in the map that 
likely correspond. 


* Practical implementations of the GraphSLAM often use additional tricks 
to keep the computation low and to avoid false data associations. Specif- 
ically, practical implementations tend to reduce the data complexity by 
extracting local maps and using each map as the basic entity; they tend 
to match multiple features at-a-time, and they tend to consider negative 
information in the data association search. 


* We briefly provided results for a variant of GraphSLAM that follows the 
decomposition idea, but uses occupancy grid maps for representing sets 
of range scans. Despite these approximations, we find that data asso- 
ciation and inference techniques yield favorable results in a large-scale 
mapping problem. 


¢ Results were also provided for a conjugate gradient implementation to the 
underlying least square problem. We noted that the general target func- 
tion of GraphSLAM can be optimized by any least squares technique. Cer- 
tain techniques, such as conjugate gradient, are significantly faster than 
the basic optimization technique in GraphSLAM. 


As noted in the introduction, EKF SLAM and GraphSLAM are extreme ends 
of a spectrum of SLAM algorithms. Algorithms that fall in between these 
extremes will be discussed in the next two chapters. References to further 
techniques can be found in the bibliographical remarks of the next few chap- 
ters. 
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Bibliographical Remarks 


Graphical inference techniques are well known in computer vision and photogrammetry, and 
are related to structure from motion and bundle adjustment (Hartley and Zisserman 2000; B et al. 
2000; Mikhail et al. 2001). The first mention of relative, graph-like constraints in the SLAM litera- 
ture goes back to Cheeseman and Smith (1986) and Durrant-Whyte (1988), but these approaches 
did not perform any global relaxation, or optimization. The algorithm presented in this chapter 
is loosely based on a seminal paper by Lu and Milios (1997). They were historically the first 
to represent the SLAM prior as a set of links between robot poses, and to formulate a global 
optimization algorithm for generating a map from such constraints. Their original algorithm 
for globally consistent range scan alignment used the robot pose variables as the frame of refer- 
ence, which differed from the standard EKF view in which poses were integrated out. Through 
analyzing odometry and laser range scans, their approach generated relative constraints be- 
tween poses that can be viewed as the edges in GraphSLAM; however, they did not phrase their 
method using information representations. Lu and Milios’s (1997) algorithm was first success- 
fully implemented by Gutmann and Nebel (1997), who reported numerical instabilities, possibly 
due to the extensive use of matrix inversion. Golfarelli et al. (1998) were the first to establish the 
relation of SLAM problems and spring-mass models, and Duckett et al. (2000, 2002) provided a 
first efficient technique for solving such problems. The relation between covariances and the in- 
formation matrix is discussed in Frese and Hirzinger (2001). Araneda (2003) developed a more 
detailed elaborate graphical model. 

The Lu and Milios algorithm initiated a development of offline SLAM algorithms that up to 
the present date runs largely parallel to the EKF work. Gutmann and Konolige combined their 
implementation with a Markov localization step for establishing correspondence when closing 
a loop in a cyclic environment. Bosse et al. (2003, 2004) developed Atlas, which is a hierarchi- 
cal mapping framework based on the decoupled stochastic mapping paradigm, which retains 
relative information between submaps. It uses an optimization technique similar to the one in 
Duckett et al. (2000) and GraphSLAM when aligning multiple submaps. Folkesson and Chris- 
tensen (2004a,b) exploited the optimization perspective of SLAM by applying gradient descent 
to the log-likelihood version of the SLAM posterior. Their Graphical SLAM algorithm reduced 
the number of variables to the path variables—just like GraphSLAM—when closing the loop. 
This reduction (which is mathematically an approximation since the map is simply omitted) 
significantly accelerated gradient descent. Konolige (2004) and Montemerlo and Thrun (2004) 
introduced conjugate gradient into the field of SLAM, which is known to be more efficient than 
gradient descent. Both also reduced the number of variables when closing large cycles, and 
report that maps with 10 features can be aligned in just a few seconds. The Levenberg Mar- 
quardt technique mentioned in the text is due to Levenberg (1944) and Marquardt (1963), who 
devised it in the context of least squares optimization. Frese et al. (2005) analyzed the efficiency 
of SLAM in the information form, and developed highly efficient optimization techniques us- 
ing multi-grid optimization techniques. He reports speed-ups of several orders of magnitude; 
the resulting optimization techniques are presently the state-of-the-art. Dellaert (2005) devel- 
oped efficient factorization techniques for the GraphSLAM constraint graph, specifically aimed 
at transforming the constraint graph into more compact versions while retaining sparseness. 

It should be mentioned that the intuition to maintain relative links between local entities is at 
the core of many of the submapping techniques discussed in the previous section—although it 
is rarely made explicit. Authors such as Guivant and Nebot (2001); Williams (2001); Tardós et al. 
(2002); Bailey (2002) report of data structures for minuting the relative displacement between 
submaps, which are easily mapped to information theoretic concepts. While many of these 
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algorithms are filters, they nevertheless share a good amount of insight with the information 
form discussed in this chapter. 

To our knowledge, the GraphSLAM algorithm presented here has never been published in 
the present form (an earlier draft of this book referred to this algorithm as extended information 
form). However, GraphSLAM is closely tied to the literature reviewed above, building on Lu 
and Milios's (1997) seminal algorithm. The name GraphSLAM bears resemblance to the name 
Graphical SLAM by Folkesson and Christensen (2004a); we have chosen it for this chapter be- 
cause graphs of constraints are the essence of this entire line of SLAM research. A number of 
authors have developed filters in information form, which address the online SLAM problem 
instead of the full SLAM problem. These algorithms will be discussed in the coming chapter, 
which explicitly addresses the problem of filtering. 

The GraphSLAM formulation of the SLAM problem relates to a decades-old discussion in the 
representation of spatial maps, which was already mentioned in the bibliographical remarks to 
Chapter 9. Information representations bring together two different paradigms of map repre- 
sentation: topological and metric. Behind this distinction is a decades-old debate about the 
representation of space, in people and for robots (Chown et al. 1995). Topological approaches 
were already discussed in the bibliographical remarks to Chapter 9. A key feature of topological 
representation pertains to the fact that they only specify relative information between entities in 
the map. Hence, they are free of the problem of finding a consistent metric embedding of this 
relative information. State-of-the-art topological tend to augment links with metric information, 
such as the distance between two locations. 

Just like topological representations, information-theoretic methods accumulate relative in- 
formation between adjacent objects (landmarks and robots). However, the relative map infor- 
mation is "translated" into metric embeddings by inference. In the linear Gaussian case, this 
inference step is loss-free and invertible. Computing the full posterior, including the covari- 
ance, requires matrix inversion. A second matrix inversion operation leads back to the original 
form of relative constraints. Thus, both the topological view and the metric view appear to be 
duals of each other, just like information-theoretic and probabilistic representations (or EKF and 
GraphSLAM). So maybe there shall be a unifying mathematical framework that embraces both, 
topological and metric maps? The reader shall be forewarned that this view has not yet been 
adopted by the mainstream research community. 


Exercises 


1. Wealready encountered bearing only SLAM in the exercises to the previous 
chapter, as a form of SLAM in which sensors can only measure the bearing 
of landmarks but not the range. We conjecture that GraphSLAM is better 
suited for this problem than the EKF. Why? 


2. In this question, you are asked to prove convergence results for a special 
class of SLAM problems: linear Gaussian SLAM. In linear Gaussian SLAM, 
the motion equation is of the simple additive type 


Zi ~ N(ae-1 + ut, R) 
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and the measurement equation is of the type 
St .— N(mj — tQ) 


where R and Q are diagonal covariances matrices, and m; is the feature 
observed at time t. You may assume that the number of landmarks is 
finite, that all landmarks are seen infinitely often and in no specific order, 
and that the correspondence is known. 


(a) Prove that for GraphSLAM the distance between any two landmarks 
converges to the correct distance with probability 1. 


(b) What does this proof entail for EKF SLAM? 


(c) Does GraphSLAM converge for the general SLAM problem with 
known correspondence? If so, argue why; if not, argue why not (with- 
out proof). 


3. The algorithm GraphSLAM reduce reduces the set of constraints by in- 
tegrating out the map variables, leaving a constraint system over robot 
poses only. Is it possible to instead integrate out the pose variables, so 
that the resulting network of constraints is defined only over map vari- 
ables? If so, what would the resulting inference problem be sparse? How 
would cycles in the robot's path affect this new set of constraints? 


4. The GraphSLAM algorithm in this chapter ignored the landmark signa- 
tures. Extend the basic GraphSLAM algorithm to utilize such signatures 
in its measurements and in its map. 


»js.on 000000 


Srjs.cn QOOOOO 


12.1 


(12.1) 


The Sparse Extended Information 
Filter 


Introduction 


The previous two chapters covered two extreme ends of a spectrum of 
SLAM algorithms. We already noted that EKF SLAM is proactive. Every 
time information is acquired, it resolves this information into a probability 
distribution—which is computationally expensive. The GraphSLAM algo- 
rithm is different: It simply accumulates information. We noted that such 
an accumulation is lazy: at the time of data acquisition, GraphSLAM simply 
memorizes the information it receives. To turn the accumulated information 
into a map, GraphSLAM performs inference. This inference is performed 
after all data is acquired. This makes GraphSLAM an offline algorithm. 

This raises the question as to whether we can devise an online filter al- 
gorithm that inherits the efficiency of the information representation. The 
answer is yes, but only with a number of approximations. The sparse extended 
information filter, or SEIF, implements an information solution to the online 
SLAM problem. Just like the EKF, the SEIF integrates out past robot poses, 
and only maintains a posterior over the present robot pose and the map. But 
like GraphSLAM and unlike EKF SLAM, SEIF maintains an information rep- 
resentation of all knowledge. In doing so, updating the SEIF becomes a lazy 
information shifting operation, which is superior to the proactive probabilis- 
tic update of the EKF. Thus, the SEIF can be seen as the best of both worlds: 
It runs online, and it is computationally efficient. 

As an online algorithm, the SEIF maintains a belief over the very same 
state vector as the EKF: 


"= (a) 
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iteration: 300 — errori 0,0952% 


Figure 12.1 Motivation for using an information filter for online SLAM. Left: Simu- 
lated robot run with 50 landmarks. Center: The correlation matrix of an EKF, which 
shows strong correlations between any two landmarks' coordinates. Right: The nor- 
malized information matrix of the EKF is naturally sparse. This sparseness leads to a 
SLAM algorithm that can be updated more efficiently. 


Here x; is the robot state, and m the map. The posterior under known corre- 
spondences is given by p(y: | z1:4, 14, C11). 

The key insight for turning GraphSLAM into an online SLAM algorithm is 
illustrated in Figure 12.1. This figure shows the result of the EKF SLAM 
algorithm in a simulated environment containing 50 landmarks. The left 
panel displays a moving robot along with its probabilistic estimate of the 
location of all 50 point features. The central information maintained by the 
EKF SLAM is a covariance matrix of these different estimates. The correla- 
tion, which is the normalized covariance, is visualized in the center panel 
of this figure. Each of the two axes lists the robot pose (location and ori- 
entation) followed by the 2-D locations of all 50 landmarks. Dark entries 
indicate strong correlations. We already discussed in the EKF SLAM chapter 
that in the limit, all feature coordinates become fully correlated—hence the 
checker-board appearance of the correlation matrix. 

The right panel of Figure 12.1 shows the information matrix Q, normal- 
ized just like the correlation matrix. As in the previous chapter, elements 
in this normalized information matrix can be thought of as constraints, or 
links, which constrain the relative locations of pairs of features in the map: 
The darker an entry in the display, the stronger the link. As this depiction 
suggests, the normalized information matrix appears to be sparse. It is dom- 
inated by a small number of strong links; and it possesses a large number of 
links whose values, when normalized, are practically zero. 
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; robot features 
link active passive 





normalized information matrix 


Figure 12.2 Illustration of the network of features generated by our approach. 
Shown on the left is a sparse information matrix, and on the right a map in which 
entities are linked whose information matrix element is non-zero. As argued in the 
text, the fact that not all features are connected is a key structural element of the 
SLAM problem, and at the heart of our constant time solution. 


The strength of each link is related to the distance of the corresponding 
features: Strong links are found only between nearby features. The more 
distant two features, the weaker their link. 

This sparseness is distinctly different from that in the previous chapter: 
First, there exist links between pairs of landmarks. In the previous chapter, 
no such links could exist. Second, the sparseness is only approximate: In fact, 
all elements of the normalized information matrix are non-zero, but nearly 
all of them are very close to zero. 

The SEIF SLAM algorithm exploits this insight by maintaining a sparse 
information matrix, in which only nearby features are linked through a non- 
zero element. The resulting network structure is illustrated in the right panel 
of Figure 12.2, where disks correspond to point features and dashed arcs to 
links, as specified in the information matrix visualized on the left. This dia- 
gram also shows the robot, which is linked to a small subset of all features 
only. Those features are called active features and are drawn in black. Storing 
a sparse information matrix requires space linear in the number of features in 
the map. More importantly, all essential updates in SEIF SLAM can be per- 
formed in constant time, regardless of the number of features in the map. 
This result is somewhat surprising, as a naive implementation of motion 
updates in information filters—as stated in Table 3.6 on page 76 一 requires 
inversion of the entire information matrix. 
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The SEIF is an online SLAM algorithm that maintains such a sparse in- 
formation matrix, and for which the time required for all update steps is 
independent of the size of the map for the case with known data association, 
and logarithmic is data association search is involved. This makes SEIF the 
first efficient online SLAM algorithm encountered in this book. 


Intuitive Description 


We begin with an intuitive description of the SEIF update, using graphical 
illustrations. Specifically, a SEIF update is composed of 4 steps: a motion 
update step, a measurement update step, a sparsification step, and a state 
estimation step. 

We begin with the measurement update step, depicted in Figure 12.3. Each 
of the two panels shows the information matrix maintained by the SEIF, 
along with the graph defined by the information links. Just as in Graph- 
SLAM, sensing a feature m, leads the SEIF to update the off-diagonal ele- 
ment of its information matrix, which links the robot pose estimate x; to the 
observed feature m4. This is illustrated in the left panel of Figure 12.3a. 

Sensing mz leads it to update the elements in the information matrix that 
link the robot pose x; and the feature m», as illustrated in Figure 12.3b. As we 
shall see, each of these updates correspond to local additions in the informa- 
tion matrix and the information vector. In both cases (information matrix and 
vector), this addition touches only elements that link the robot pose variable 
to the observed feature. As in GraphSLAM, the complexity of incorporating 
a measurement into a SEIF takes time independent of the size of the map. 

The motion update differs from GraphSLAM, since, as a filter, the SEIF 
eliminates past pose estimates. It is shown in Figure 12.4. Here a robot's 
pose changes; Figure 12.4a depicts a the information state before, and Fig- 
ure 12.4b after motion, respectively. The motion affects the information state 
in multiple ways. First, the links between the robot's pose and the features 
mı, Mz are weakened. This is a result of the fact that robot motion introduces 
new uncertainty, hence causes us to lose information about where the robot 
is relative to the map. However, this information is not entirely lost. Some 
of itis mapped into information links between pairs of features. This shift of 
information comes about since even though we lost information on the robot 
pose, we did not lose information on the relative location of features in the 
map. Whereas previously, those features were linked indirectly through the 
robot pose, they are now linked also directly after the update step. 
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Figure 12.3 The effect of measurements on the information matrix and the associ- 
ated network of features: (a) Observing mı results in a modification of the informa- 
tion matrix elements Q:,,m,. (b) Similarly, observing mə affects (2c ma. 
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Figure 12.4 The effect of motion on the information matrix and the associated 
network of features: (a) before motion, and (b) after motion. If motion is non- 
deterministic, motion updates introduce new links (or reinforce existing links) be- 
tween any two active features, while weakening the links between the robot and those 
features. This step introduces links between pairs of features. 


(a) X44 my m, m, X (b) X44 my m, m, Fii 
Xis 5 sparsification 
m, m, 
m, m, 
m, m, m O 
m, = m, 








Figure 12.5 Sparsification: A feature is deactivated by eliminating its link to the 
robot. To compensate for this change in information state, links between active fea- 


tures and/or the robot are also updated. The entire operation can be performed in 
constant time. 
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The shift of information from robot pose links to between-feature links 
is a key element of the SEIF. It is a direct consequence of using the infor- 
mation form as a filter, for the online SLAM problem. By integrating out 
past pose variables, we lose those links, and they are mapped back into the 
between-feature elements in the information matrix. This differs from the 
GraphSLAM algorithm discussed in the previous chapter, which never in- 
troduced any links between pairs of features in the map. 

For a pair of features to acquire a direct link in this process, both have to be 
active before the update, hence their corresponding elements linking them to 
the robot pose in the information matrix have to be non-zero. This is illus- 
trated in Figure 12.4: A between-feature link is only introduced between fea- 
tures mı and m». Feature ms, which is not active, remains untouched. This 
suggests that by controlling the number of active landmarks at any point in 
time, we can control the computational complexity of the motion update, and 
the number of links in the information matrix. If the number of active links 
remains small, so will the update complexity for the motion update, and so 
will the number of non-zero between-landmark elements in the information 
matrix. 

SEIF therefore employs a sparsification step, illustrated in Figure 12.5. The 
sparsification involves the removal of a link between the robot and an active 
feature, effectively turning the active feature into a passive one. In SEIFs, 
this arc removal leads to a redistribution of information into neighboring 
links, specifically between other active features and the robot pose. The time 
required for sparsification is independent of the size of the map. However, 
it is an approximation, one that induces an information loss in the robot's 
posterior. The benefit of this approximation is that it induces true sparseness, 
and hence makes it possible to update the filter efficiently. 

There exists one final step in the SEIF algorithm, which is not depicted in 
any of the figures. This step involves the propagation of a mean estimate 
through the graph. As was already discussed in Chapter 3, the extended 
information filter requires an estimate of the state u for linearization of the 
motion and the measurement model. SEIFs also require a state estimate for 
the sparsification step. 

Clearly, one could recover the state estimate through the equation u = 
Q-1€, where Q is the information vector, and £ the information state. How- 
ever, this would require solving an inference problem that is too large to be 
run at each time step. SEIFs circumvent the step by an iterative relaxation al- 
gorithm that propagates state estimates through the information graph. Each 
local state estimate is updated based on the best estimates of its neighbors in 
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the information graph. This relaxation algorithm converges to the true mean 
u. Since the information form is sparse in SEIFs, each such update requires 
constant time, though with the caveat that more than a finite number of such 
updates may be needed to achieve good results. To keep the computation 
independent of the size of the state space, SEIFs perform a fixed number of 
such updates at any iteration. The resulting state vector is only an approx- 
imation, which is used instead of the correct mean estimate in all updating 
steps. 


The SEIF SLAM Algorithm 


The outer loop of the SEIF update is depicted in Table 12.1. The algo- 
rithm accepts as input an information matrix €), 1, an information vector 
£i 11, and an estimate of the state Ht 1. It also accepts a measurement z+, 
a control u;, and a correspondence vector c;. The output of the algorithm 
SEIF SLAM known correspondences is a new state estimate, represented 
by the information matrix Q, and the information vector £j. The algorithm 
also outputs an improved estimate ji. 

As stated in Table 12.1, the SEIF update proceeds in four major steps. The 
motion update in Table 12.2 incorporates the control u, into the filter esti- 
mate. It does so through a number of computationally efficient operations. 
Specifically, the only components of the information vector/matrix that are 
modified in this update are those of the robot pose and the active features. 
The measurement update in Table 12.3 incorporates the measurement vector 
z, under known correspondence c,. This step is also local, just like the mo- 
tion update step. It only updates the information values of the robot pose 
and the observed features in the map. The sparsification step, shown in Ta- 
ble 12.4, is an approximate step: It removes active features by transforming 
the information matrix and the information vector accordingly. This step is 
again efficient; it only modifies links between the robot and the active land- 
marks. Finally, the state estimate update in Table 12.5, applies an amortized 
coordinate descent technique to recover the state estimate u+. This step once 
again exploits the sparseness of the SEIF, through which it only has to consult 
a small number of other state vector elements in each incremental update. 

Together, the entire update loop of the SEIF is constant time, in that the 
processing time is independent of the size of the map. This is in stark contrast 
to the only other online SLAM algorithm discussed so far—the EKF—which 
requires time quadratic the size of the map for each update. However, such 
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Algorithm SEIF SLAM known correspondences(Z, 1, (3, .1, 





Ht- 1, Ut, 2t, Ct): 
2 n SEIF motion update(£, 1, 1, Ju 31, Ut) 
= SEIF update state estimate(£,, );, ji) 
Mn SEIF measurement update(£,, Qi, Ht, Zt, c1) 
&,0,— SEIF sparsification(£;, Q4) 
return Ge On, pt 





Table 12.1 The Sparse Extended Information Filter algorithm for the SLAM Prob- 
lem, here with known data association. 





n? 





Algorithm SEIF motion update(£, 1, Q; 1, 1 1, Ut): 


100 0.…:0 
Bc 0 1 0 0-0 
0 0 1 0-0 
—— 
3N 
D Sin Nt-19 + - sin(p— 1,9 + wA) 
PA - COS Mt 一 1.0 一 四 cos(1—1,0 + X At) 
Up I 
0 0 = - COS [441,0 一 T cos(HUt 16 + wAt) 
0 0 v E Sin Mt-1,0 一 o sin(Uz-1,9 + w,At) 
0 0 0 
Hi = E TO, V, + UF O, 1 V, 
— Oa dA 
= ð, FT(R7' + F, 6, ET)! F, 6, 
5, o, — Kt 


& = £a + (At — Ke) iia + ù FT Or 
pi = bea + FT 6 
return £r, Qt, ju 





Table 12.2 The motion update in SEIFs. 
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1: Algorithm SEIF_measurement_update(€,, Qi, Ht, Zt, Ce): 
or 0 0 
2: Qi 一 0 og 0 
0 0 o; 
for all observed features zi = (ri Qi si)" do 
jae 
if landmark j never seen before 
Hija Ht: (f cost + noo) 
6: Hjy |=| Hey | + 于 | sialo + uo) 
His 5i 0 
"E endif 
8: s eee e 
by Hj,y T Hty 
9: q=6'6 
. vd 
10: ĉi = | atan2(0,,0,) 一 Hp 
Hj,s 
fe —/qdó, 0 0-0 —4/qó, J/qó, 0 0---0 
11: pai by dg -1 0.0 -ó, -ó, 0 0.…0 
q 0 0 0 0.…0 0 0 1 0-0 
YS Sn 
37-3 3j 
12: endfor 
13: & = & + 0, HiT Qr [ei - 8 - Hi m 
14 m= + HI OG Hi 
15: return £,, a 








Table 12.3 The measurement update step in SEIFs. 
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1: Algorithm SEIF sparsification(£;, Q4): 
2: define Fm, , Fr,mo, Fz as projection matrices 
from y, to mo, (z, mo}, and z, respectively 
8: Oe = u V EIU R Fm)! FZ, OO 
+ Q? Fimo (Eus Q? Pos Ius 0? 
— 0, Fy (FT QU Fa)! FPO, 
4: & = & + pu (à — ù) 
5: return &, Q 
Table 12.4 The sparsification step in SEIFs. 
1: Algorithm SEIF_update_state_estimate(é,, Q4, j+): 
2: for a small set of map features m; do 
0-0 1 O0 O.-.0 
3: F; = 0.0 0 1 0-0 
\ 一 一 \ 一 -一 
2(N—i) 2(i—1)r 
4: pis = (Fi u FP)? Fi [& — Qs pi + % FT Fi iu 
5: endfor 
6: for all other map features m; do 
Z: li, t = Big 
8: endfor 
100 0-0 
9: 元 二 0 1.0 O-:-0 
0 0 1 O-:-0 
\ 一 -一 
3N 
10: His, = (Fe Qi FT)! E, [Et — Qe ie +02 FT Fe i] 
11: return pu 








Table 12.5 The amortized state update step in SEIFs updates a small number of state 
estimates. 
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a “constant time SLAM” conjecture should be taken with a grain of salt: The 
recovery of state estimates is a computational problem for which no linear- 
time solution is presently known, if the environment possesses large cycles. 


Mathematical Derivation of the SEIF 


Motion Update 


The motion update in SEIF processes the control u; by transforming the in- 
formation matrix €);.., and the information vector £, , into a new matrix ÑQ, 
and vector £j. As usual, the bar in our notation indicates that this predic- 
tion is only based on the control; it does not yet take the measurement into 
account. 

The motion update in SEIFs exploits the sparseness of the informa- 
tion matrix, which makes it possible to perform this update in time 
independent of the map size n. This derivation is best started with 
the corresponding formula for the EKF. We begin with the algorithm 
EKF SLAM known correspondences in Table 10.1, page 314. Lines 3 and 5 
state the motion update, which we restate here for the reader's convenience: 





lh = m1 FTS 
Š = Gi X1 GI + eR F; 


The essential elements of this update were defined as follows: 


1 0 0 0-0 
FQ = 0 1 0 O---0 
0 0 1 0-0 
=e sin 141,0 + m sin(j; 1,0 + e AL) 
ô = E COS J41—1,0 一 ve cos(p—1,0 + wi A) 
QU; A 
0 0 E COS J41—1,0 一 e cos(44—1,0 + wAt) 
A = 0 0 vs sin 41,0 一 m sin(14—1,9 + 4 At) 
0 0 0 


G, = I+FTAF, 


In SEIFs, we have to define the motion update over the information vector £ 
and the information matrix €. From Equation (12.3), the definition of G; in 
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(12.7), and the information matrix equation 2 = X !, it follows that 


Or. [Gi Q} GI + FT BEES 


= [CEE A Fy) Q74 (+ FZ A Fp) + FT RE) 


A key insight is this update can be implemented in constant time—regardless 
of the dimension of Q. The fact that this is possible for sparse matrices ();—1 
is somewhat non-trivial, since Equation (12.8) seems to require two nested 
inversions of matrices of size (3N + 3) x (3N + 3). As we shall see, if Q,.., is 
sparse, this update step can be carried out efficiently. We define 


= -1 
$, = [G O04 GT] 
= [Gi] u- Get 
and hence Equation (12.8) can be rewritten as 
Q = [671+F7 R F] 


We now apply the matrix inversion lemma and obtain: 


9, 


[BE + FT R Fa] ` 
$,— ©, FT(R;! + Fy 9, FT) ! F, & 
————M———— 


= $i—k 


Here Ar is defined as indicated. This expression can be calculated in constant 
time if we can compute 6, in constant time from €), ,. To see that this is in- 
deed possible, we note that the argument inside the inverse, R7 ! +F, 6, FT, 
is 3-dimensional. Multiplying this inverse with F and F, induces a matrix 
that is of the same size as 2; however, this matrix is only non-zero for the 
3 x 3 sub-matrix corresponding to the robot pose. Multiplying this matrix 
with a sparse matrix (3;., (left and right) touches only elements for which 
the off-diagonal element in Q,—ı between the robot pose and a map feature 
is non-zero. Put differently, the result of this operation only touches rows and 
columns that correspond to active features in the map. Since sparsity implies 
that the number of active features in Q;..; is independent of the size of Q¢_1, 
the total number of non-zero elements in Ai is also O(1). Consequently, the 
subtraction requires O(1) time. 

It remains to be shown that we can calculate 9, from €. 4 in constant 
time. We begin with a consideration of the inverse of G;, which is efficiently 
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calculated as follows: 
G7! = U+FF AR) 

= (IL-F? I F,+ FTI P,+F7 AF)? 

R ead 

= (1-FIIF,+ FI (I+A) F)! 

= I-PFDIPFQEFI(I-A)!RED 

= I+FP(I+A)1-% 

—— 

= I+ 
By analogy, we get for the transpose [G7]! = (I+ FT AT F,)-1=I+?. 
Here the matrix V, is only non-zero for elements that correspond to the robot 
pose. Itis zero for all features in the map, and hence can be computed in con- 
stant time. This gives us for our desired matrix ©, the following expression: 
$, = (I-9I) Q 7+) 
Qy-1 + yr Q, 4 4 0,14 V+ VT OQ, 4 V, 


^ 





= 人 -1 十 和 


where V, is zero except for the sub-matrix corresponding to the robot pose. 
Since Wi , is sparse, A, is zero except for a finite number of elements, which 
correspond to active map features and the robot pose. 

Hence, $4, can be computed from Q:—ı in constant time, assuming that 
Q:—ı is sparse. Equations (12.11) through (12.13) are equivalent to lines 5 
through 9 in Table 12.2, which proves the correctness of the information ma- 
trix update in SEIF motion, update. 

Finally, we show a similar result for the information vector. From (12.2) 
we obtain 


be = pA + FP Or 

This implies for the information vector: 

& = Qi (Qr éci Fp A) 
= ù 9s & 1 £s FT br 
= (Que +w- — 0, + 9; - B) mE &4 + f£ F? Ot 
= (Q —9,- 9, u1 + 4-1) Qh G1 + FT 0 


=0 =0 
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(0, — Be + Di — 0,4) Ol; &-1 Oa OF, G1 0, FT 6 
SS |o, Ninny piel Ne 
inii = 入 = Mt-1 =I 


= 6a + (Àt — Ke) Ni 十 Q Fr Ôt 


Since A, and Ai are both sparse, the product (A; — &4) H-1 only contains 
finitely many non-zero elements and can be calculated in constant time. Fur- 
ther, FT ô; is a sparse matrix. The sparseness of the product 0, FT 6; follows 
now directly from the fact that Q; is sparse as well. 


Measurement Updates 


The second important step of SLAM concerns the update of the filter in accor- 
dance to robot motion. The measurement update in SEIF directly implements 
the general extended information filter update, as stated in lines 6 and 7 of 
Table 3.6, page 76: 


ù% = Oy de QU H, 
& = &+ HI Qe [ze — h(ps) — Hi pu] 


Writing the prediction 2, = h(ji;) and summing over all individual elements 
in the measurement vector leads to the form in lines 13 and 14 in Table 12.3: 


Q = Q+ M ni Qr 
& = 总 二》 HiT QU [zi — 2 — Hi pm] 
Here Q;, ô, q, and Hj are defined as before (e.g., Table 11.2 on page 348). 


Sparsification 


General Idea 


The Key step in SEIFs concerns the sparsification of the information matrix 
Q. Because sparsification is so essential to SEIFs, let us first discuss it in 
general terms before we apply it to the information filter. Sparsification is an 
approximation through which a posterior distribution is approximated by 
two of its marginals. Suppose a, b, and c are sets of random variables (not 
to be confused with any other occurrence of these variables in this book!), 
and suppose we are given a joint distribution p(a, b, c) over these variables. 
To sparsify this distribution, we have to remove any direct link between the 
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variables a and b. In other words, we would like to approximate p by a 
distribution p for which the following property holds: p(a | b,c) = p(a | c) 
and p(b | a,c) = p(b | c). In multivariate Gaussians, it is easily shown that 
this conditional independence is equivalent to the absence of a direct link 
between a and b. The corresponding element in the information matrix is 
Zero. 

A good approximation f is obtained by a term proportional to the product 
of the marginals, p(a, c) and p(b,c). Neither of these marginals retain de- 
pendence between the variables a and b, since they both contain only one 
of those variables. Thus, the product p(a,c) p(b,c) does not contain any 
direct dependencies between a and b; instead, a and b are conditionally in- 
dependent given c. However, p(a,c) p(b,c) is not yet a valid probability 
distribution over a, b, and c. This is because c occurs twice in this expression. 
However, proper normalization by p(c) yields a probability distribution (as- 
suming p(c) > 0): 


adu) = Pitre) pto) 
P(c) 
To understand the effect of this approximation, we apply the following trans- 
formation: 
p(a,b, c) p(a, c) p(b. c) 


p(a,b,c) = (9 





plac) p(b.c) 





= p(a,b,c) p(c) p(a,b,c) 
Y. a C Dals 
ee ai 


In other words, removing the direct dependence between a and b is equiva- 
lent to approximating the conditional p(a | b, c) by a conditional p(a | c). We 
also note (without proof) that among all approximations q of p where a and b 
are conditionally independent given c, the one described here is "closest" to 
p, where closeness is measured by the Kullback-Leibler divergence, a com- 
mon asymmetric measure of the “nearness” of one probability distribution 
to another. 

An important observation pertains to the fact that the original p(a | b,c) 
is at least as informative as p(a | c), the conditional that replaces p(a | b, c) in 
p. This is because p(a | b,c) is conditioned on a superset of variables of the 
conditioning variables in p(a | c). For Gaussians, this implies that the vari- 
ances of the approximation p(a | c) is equal or larger than the variance of the 
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original conditional, p(a | b,c). Further, the variances of the marginals f(a), 
p(b), and p(c) are also larger than or equal to the corresponding variances of 
p(a), p(b), and p(c). In other words, it is impossible that the variance shrinks 
under this approximation. 


Sparsification in SEIFs 


The SEIF applies the idea of sparsification to the posterior p(y | 
21:4, U1: Cit), to maintain an information matrix 1; that is sparse at all times. 
To do so, it suffices to deactivate links between the robot pose and individ- 
ual features in the map. If done correctly, this also limits the number of links 
between pairs of features. 

To see, let us briefly consider the two circumstances under which a new 
link may be introduced. First, observing a passive feature activates this fea- 
ture and hence introduces a new link between the robot pose and the very 
feature. Second, motion introduces links between any two active features. 
This consideration suggests that controlling the number of active features 
can avoid violation of both sparseness bounds. Thus, sparseness is achieved 
simply by keeping the number of active features small at any point in time. 

To define the sparsification step, it will prove useful to partition the set of 
all features into three disjoint subsets: 


m = mt 十 m? + m7 


where m* is the set of all active features that shall remain active. The set rn? 
are one or more active features that we seek to deactivate. Put differently, we 
seek to remove the links between m? and the robot pose. And finally, m^ 
are all currently passive features; they shall remain passive in the process 
of sparsification. Since m* U m? contains all currently active features, the 
posterior can be factored as follows: 


py | Zt, Urt C1:t) 
plzen m9, m*,m^ | Zt) Urt; Crt) 


= p(z | m?, m', m, Zi:t; Ult; Cit) p(m?, fmit, m | 21:t, Ult; C1:t) 





— p(zi | m?, m',m — 0, Zia; Us Crt) p(m9,m*,m^ | Z1:t; U1:t; Cit) 


In the last step we exploited the fact that if we know the active features m? 


and mt, the variable x, does not depend on the passive features m~. We can 
hence set m~ to an arbitrary value without affecting the conditional posterior 
over x4, p(x; | ME, MF, MT, zy4,u14, C14). Here we simply choose m^ = 0. 
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Following the sparsification idea discussed in general terms in the previ- 
ous section, we now replace p(x, | m?,m*,m^ = 0) by p(zi | m*,m- = 0) 
and thereby drop the dependence on m. 
p(zi,m | Zi, Ut; i:t) 

= plz | m*, m = 0, Zi:t; Ult; C1:t) p(m?, m*, m | Z1:t; U1:t; C1:t) 

This approximation is obviously equivalent to the following expression: 


万 (Zi， m | Zi:t; U1:t» C14) 


p(t, mt | m = 0, Zi:t, Ult; Cit) 





0 s = 
m Z1:t; U1:t, C1: 
p(m* | m- = 0, 21:4, 01:5 Crt) p(m TRO, | Lit) U1:t; 1:4) 


Mathematical Derivation of the Sparsification 


In the remainder of this section, we show that the algorithm 
SEIF sparsification in Table 12.4 implements this probabilistic calcula- 
tion, and that it does so in constant time. We begin by calculating the 
information matrix for the distribution p(z;,m?,m* | m^ = 0) of all 
variables but m~, and conditioned on m^ = 0. This is obtained by extracting 
the sub-matrix of all state variables but m~: 


0 T T 
Q; = Fr,m+,mo Toad Q Fons mo E ntm) 


With that, the matrix inversion lemma (Table 3.2 on page 50) leads to the fol- 


lowing information matrices for the terms p(z;, m* | m^ = 0, 21:4, uia. cia) 
and p(m* | m^ = 0, 21:4, 14, 014), denoted Q} and ?, respectively: 

OF = V- Fa, (Fimo 2% Fm Fimo % 

OF > Q? I Q? Fimo (ee Q? Eu) ET Q? 


Here the various F-matrices are projection matrices that project the full state 
y: into the appropriate sub-state containing only a subset of all variables— 
in analogy to the matrix Fy used in various previous algorithms. The final 
term in our approximation (12.25), p(m?,m* , m^ | zy, uia, 014), possesses 
the following information matrix: 


B = Q,-QOELCETO,E,) ! FTO, 


Putting these expressions together according to Equation (12.25) yields the 
following information matrix, in which the feature m? is now indeed deacti- 
vated: 


Oo = 01-02-03 
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= h% L Fn (FZ, $ EL MES DO9 


+ Q? Fumo $24 Q? Eos Beas Q? 
-Qu Fy (FE ù, Fa)! FE Q 


The resulting information vector is now obtained by the following simple 
consideration: 


& = Q, Pt 
(w — Q, + X) pe 
= Qim 十 (Q — £4) pi 
= & + (Q — u) be 


This completes the derivation of lines 3 and 4 in Table 12.4. 


Amortized Approximate Map Recovery 


The final update step in SEIFs is concerned with the computation of the mean 
p. Throughout this section, we will drop the time index from our notation, 
since it plays no role in the techniques to be discussed. So we will write ji 
instead of ji. 

Before deriving an algorithm for recovering the state estimate u from the 
information form, let us briefly consider what parts of ju are needed in SEIFs, 
and when. SEIFs need the state estimate u of the robot pose and the active 
features in the map. These estimates are needed at three different occasions: 


1. The mean is used for the linearization of the motion model, which takes 
place in lines 3, 4, and 10 in Table 12.2. 


2. It is also used for linearization of the measurement update, see lines 6, 8, 
10, 13 in Table 12.3. 


3. Finally, it is used in the sparsification step, specifically in line 4 in Ta- 
ble 12.4. 


However, we never need the full vector u. We only need an estimate of the 
robot pose, and an estimate of the locations of all active features. This is a 
small subset of all state variables in u. Nevertheless, computing these esti- 
mates efficiently requires some additional mathematics, as the exact approach 
for recovering the mean via u = (1^! € requires matrix inversion or the use 
of some other optimization technique—even when recovering a subset of 
variables. 


»js.on 000000 


(12.32) 


(12.33) 


(12.34) 


COORDINATE DESCENT 


(12.35) 


(12.36) 


12.6 Amortized Approximate Map Recovery 403 


Once again, the key insight is derived from the sparseness of the matrix 
Q. The sparseness enables us do define an iterative algorithm for recovering 
state variables online, as the data is being gathered and the estimates € and 
Q are being constructed. To do so, it will prove convenient to reformulate 
p=! ¿asan optimization problem. As we will show in just a minute, the 
state u is the mode 


B = argmax p(u) 
m 


of the following Gaussian distribution, defined over the variable ju: 


plu) = mexp(-im Qpu-£€" yu) 


Here y is a vector of the same form and dimensionality as u. To see that this 
is indeed the case, we note that the derivative of p(j:) vanishes at u = Q7! £: 


op(u ! 
s = g(-Qu-4£)exp(-iuTOQpn4£T u) = 0 
which implies €? p = £ or, equivalently, y = (1^! £. 

This transformation suggests that recovering the state vector u is equiva- 
lent to finding the mode of (12.33), which now has become an optimization 
problem. For this optimization problem, we will now describe an iterative 
hill climbing algorithm which, thanks to the sparseness of the information 
matrix. 

Our approach is an instantiation of coordinate descent. For simplicity, we 
state it here for a single coordinate only; our implementation iterates a con- 
stant number K of such optimizations after each measurement update step. 
The mode Å of (12.33) is attained at: 
fj = argmax exp (-3 u Ope m 

H 
= argmin lu Qu-—£T yu 
H 
We note that the argument of the min-operator in (12.35) can be written in a 


form that makes the individual coordinate variables u; (for the i-th coordi- 
nate of u) explicit: 


iu Quén = iM > OG ay — E pi 
er i 


where Q; ; is the element with coordinates (i, j) in the matrix Q, and £; if the 
i-th component of the vector £. Taking the derivative of this expression with 
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respect to an arbitrary coordinate variable u; gives us 


o 
T DODOM 45 DOE my = X ujmi 


Setting this to zero leads to the optimum of the i-th coordinate variable p; 
given all other estimates p: 


m = Gi e Baia 
j#i 


The same expression can conveniently be written in matrix notation. Here 
we define F; = (0...0 1 0...0) to be a projection matrix for extracting the 
i-th component from the matrix 2: 


m = (ROOF) F[€-Op+0 FF F, p 


This consideration derives our incremental update algorithm. Repeatedly 
updating 


pi — (ROR) E[E-Ouc-O FF E u] 


for some element of the state vector u; reduces the error between the left- 
hand side and the right-hand side of Equation (12.39). Repeating this update 
indefinitely for all elements of the state vector converges to the correct mean 
(without proof). 

As is easily seen, the number of elements in the summation in (12.38), and 
hence the vector multiplication in the update rule (12.40), is constant if Q is 
sparse. Hence, each update requires constant time. To maintain the constant- 
time property of our SLAM algorithm, we can afford a constant number of 
updates K per time step. This usually leads to convergence over many up- 
dates. 

However, a note of caution is in order. The quality of this approximation 
depends on a number of factors, among them the size of the largest cyclic 
structure in the map. In general, a constant number of K updates per time 
step might be insufficient to yield good results. Also, there exists a number of 
optimization techniques that are more efficient than the coordinate descent 
algorithm described here. A “classical” example is the conjugate gradient 
discussed in the context of GraphSLAM. In practical implementations it is 
advisable to rely on efficient optimization techniques to recover ju. 
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landmarks: 50 links: 1275 





(b) 








Figure 12.6 Comparison of (a) SEIF without sparsification with (b) SEIF using the 
sparsification step with 4 active landmarks. The comparison is carried out in a sim- 
ulated environment with 50 landmarks. In each row, the left panel shows the set of 
links in the filter, the center panel the correlation matrix, and the right panel the nor- 
malized information matrix. Obviously, the sparsified SEIF maintains many fewer 
links, but its result is less confident as indicated by its less-expressed correlation ma- 
trix. 


How Sparse Should SEIFs Be? 


A key question pertains to the degree of sparseness one should enforce in a 
SEIF. In particular, the number of active features in SEIFs determines the de- 
gree of sparseness. The sparseness trades off two factors: the computational 
efficiency of the SEIF, and the accuracy of the result. When implementing a 
SEIF algorithm, it is therefore advisable to get a feeling for this trade-off. 

The “gold standard" for a SEIF is the EKF, which avoids sparsification 
and also does not rely on relaxation techniques for recovering the state es- 
timate. The following comparison characterizes the three key performance 
measures that set sparse SEIFs apart from EKFs. Our comparison is based on 
a simulated robot world, in which the robot senses the range, proximity, and 
identity of nearby landmarks. 
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Figure 12.7 The comparison of average CPU time between SEIF and EKF. 
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Figure 12.8 The comparison of average memory usage between SEIF and EKF. 
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Figure 12.9 The comparison of root mean square distance error between SEIF and 
EKF. 
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1. Computation. Figures 12.7 compares the computation per update in 
SEIFs with that in EKFs; in both cases the implementation is optimized. 
This graph illustrates the major computational ramification of the proba- 
bilistic versus information representation in the filter. While EKFs indeed 
require time quadratic in the map size, SEIFs level off and require constant 
time. 


2. Memory. Figure 12.8 compares the memory use of EKFs with that of 
SEIFs. Here once again, EKFs scale quadratically, whereas SEIFs scale 
linearly, due to the sparseness of its information representation. 


3. Accuracy. Here EKFs outperform SEIFs, due to the fact that SEIFS require 
approximation for maintaining sparseness, and when recovering the state 
estimate u. This is shown in Figure 12.9, which plots the error of both 
methods as a function of map size. 


One way to get a feeling for the effect of the degree of sparseness can be 
obtained via simulation. Figure 12.10 plots the update time and the approx- 
imation error as a function of the number of active landmarks in the SEIF 
update, for a map consisting of 50 landmarks. The update time falls mono- 
tonically with the number of active features. Figure 12.11 shows the corre- 
sponding plot for the error, comparing the EKF with the SEIF at different 
degrees of sparseness. The solid line is the SEIF as described, whereas the 
dashed line corresponds to a SEIF that recovers the mean y; exactly. As this 
plot suggests, 6 active features seem to provide competitive results, at sig- 
nificant computational savings over the EKF. For smaller numbers of active 
features, the error increases drastically. A careful implementation of SEIFs 
will require the experimenter to vary this important parameter, and graph 
its effect on key factors as done here. 
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Figure 12.10 The update time of the EKF (leftmost data point only) and the SEIF, 
for different degrees of sparseness, as induced by a bound on the number of active 
features as indicated. 
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Figure 12.11 The approximation error EKF (leftmost data point only) and SEIF for 
different degrees of sparseness. In both figures, the map consists of 50 landmarks. 
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Incremental Data Association 


We will now turn our attention to the problem of data association in SEIFs. 
Our first technique will be the familiar incremental approach, which greed- 
ily identifies the most likely correspondence, and then treats this value as if it 
was ground truth. We already encountered an instance of such a greedy data 
association technique in Chapter 10.3, where we discussed data association 
in the EKF. In fact, the only difference between greedy incremental data asso- 
ciation in SEIFs and EKFs pertains to the calculation of the data association 
probability. As a rule of thumb, computing this probability is generally more 
difficult in an information filter than in a probabilistic filter such as the EKF, 
since the information filter does not keep track of covariances. 


Computing Incremental Data Association Probabilities 


As before, the data association vector at time t will be denoted c+. The greedy 
incremental technique maintains a set of data association guesses, denoted 
&14. In the incremental regime, we are given the estimated correspondences 
61:41 from previous updates when computing ¢;. The data associating step 
then pertains to the estimation of the most likely value for the data asso- 
ciation variable ĉ at time t. This is achieved via the following maximum 
likelihood estimator: 


Ct = argmax p(zi | 21:t-1, Uist, €14—1, Ct) 
Ct 


mE argmax | ple | ye, ce) P(Yt | 1:51, 91:4, Grt—1) dyt 
—— am 


Ct 
9,6: 


= argmax f [ve | Lt, Yers Ct) p(zt, ye, | Big ay Ur Obs d) dz dye, 


Ct 


Our notation p(2 | £t, Yc,, Ce) of the sensor model makes the correspondence 
variable c; explicit. Calculating this probability exactly is not possible in con- 
stant time, since it involves marginalizing out almost all variables in the map. 
However, the same type of approximation that was essential for the efficient 
sparsification can also be applied here as well. 

In particular, let us denote by m the combined Markov blanket of the 
robot pose x; and the landmark y.,. This Markov blanket is the set of all 
features in the map that are linked to the robot of landmark y,,. Figure 12.12 
illustrates this set. Notice that m? includes by definition all active land- 
marks. The spareness of Q, ensures that m contains only a fixed number 
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Figure 12.12 The combined Markov blanket of feature y,, and the observed features 
is usually sufficient for approximating the posterior probability of the feature loca- 
tions, conditioning away all other features. 


of features, regardless of the size of the map N. If the Markov blankets of 
x, and of ye, do not intersect, further features are added that represent the 
shortest path in the information graph between zt and of ye,- 

All remaining features will now be collectively referred to as mz: 
m, = m—mé,— {Ye} 


Ct 


The set mz, contains only features that have only a minor impact on the tar- 
get variables, x; and y-,. The SEIF approximates the probability p(£t, yc, | 
Z14—1, 1:4, €1:4—1) in Equation (12.41) by essentially ignoring these indirect 
influences: 


p(zt, ye, | Z1:t—1; Ul:t, C1:t-1) 

=e + m- ^ 十 一 

J [ve Je, ; Me,» Me, | Z14—1, U1: C1t-1) dma, dmz, 

= (x | má, ma, 64-1) 

A p t: Uc, Mers Me, Z14—15 U1:t; C1:t-1 
(má, | mg erat) pm, | erat) dig, dmz, 
P\Me, | Mers Z1:t—1, U1:t; C1:t—1) PAS, | Z1:t—1, U1:t; C1:t—1) Me, Me, 

RI (a |m}, m7 一 1，Z u4:,6€ ) 

nc p ts Yer Ct? Ct = He 1:t—1» V1:t» ©1:t-1 


+ EDAM A + 
p(m;. | Me, 一 ind Z1:t—1, ber Ĉr:t—1) dmg, 
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This probability can be computed in constant time if the set of features con- 
sidered in this calculation is independent of the map size (which it generally 
is). In complete analogy to various derivations above, we note that the ap- 
proximation of the posterior is simply obtained by carving out the submatrix 
corresponding to the two target variables: 


ont T T —1 
Xue, S Fs Mes Husa OQ QT Fiesse 
Bt, = Bt Fs ye, 


This calculation is constant time, since it involves a matrix whose size is in- 
dependent of N. From this Gaussian, the desired measurement probability 
in Equation (12.41) is now easily recovered. 

As in our EKF SLAM algorithm, features are labeled as new when the like- 
lihood p(z; | z14—1, 14, €14—1, €t) remains below a threshold a. We then 
simply set ¢ = Ny-1 + 1 and N; = Ny-1 + 1. Otherwise the size of the map 
remains unchanged, hence N; = N;. 1. The value €; is chosen that maximizes 
the data association probability. 

As a last caveat, sometimes the combines Markov blanket is insufficient, 
in that it does not contain a path between the robot pose and the landmark 
that is being tested for correspondence. This will usually be the case when 
closing a large cycle in the environment. Here we need to augment the set 
of features mz. by a set of landmarks along at least one path between me, 
and the robot pose r;,. Depending on the size of this cycle, the numbers of 
landmarks contained in the resulting set may now depend on N, the size of 
the map. We leave the details of such an extension as an exercise. 


Practical Considerations 


In general, the incremental greedy data association technique is brittle, as 
discussed in the chapters EKF SLAM. Spurious measurements can easily 
cause false associations and induce significant errors into the SLAM estimate. 
The standard remedy for this brittleness—in EKFs and SEIFs alike—pertains 
to the creation of a provisional landmark list. We already discussed this ap- 
proach in depth in Chapter 10.3.3, in the context of EKF SLAM. A provi- 
sional list adds any new feature that has not been previously observed into a 
candidate list, which is maintained separately from the SEIF. In the measure- 
ment steps that follow, the newly arrived candidates are checked against all 
candidates in the waiting list, reasonable matches increase the weight of cor- 
responding candidates, and not seeing a nearby feature decreases its weight. 
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Figure 12.13 The vehicle used in our experiments is equipped with a 2-D laser range 
finder and a differential GPS system. The vehicle’s ego-motion is measured by a lin- 
ear variable differential transformer sensor for the steering, and a wheel-mounted 
velocity encoder. In the background, the Victoria Park test environment can be seen. 
Image courtesy of José Guivant and Eduardo Nebot, Australian Centre for Field 
Robotics. 


When a candidate’s weight is above a certain threshold, it joins the SEIF net- 
work of features. 

We notice that data association violates the constant time property of 
SEIFs. This is because when calculating data associations, multiple features 
have to be tested. If we can ensure that all plausible features are already 
connected in the SEIF by a short path to the set of active features, it would 
be feasible to perform data association in constant time. In this way, the 
SEIF structure naturally facilitates the search of the most likely feature given 
a measurement. However, this is not the case when closing a cycle for the 
first time, in which case the correct association might be far away in the SEIF 
adjacency graph. 

We will now briefly turn our attention to an implementation of the SEIF 
algorithm using a physical vehicle. The data used here is a common bench- 
mark in the SLAM field. This data set was collected with an instrumented 
outdoor vehicle driven through a park in Sydney, Australia. 

The vehicle and its environment are shown in Figures 12.13 and 12.14, re- 
spectively. The robot is equipped with a SICK laser range finder and a system 
for measuring steering angle and forward velocity. The laser is used to detect 
trees in the environment, but it also picks up hundreds of spurious features 
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Figure 12.14 The testing environment: A 350 meter by 350 meter patch in Victoria 
Park in Sydney. Overlayed is the integrated path from odometry readings. Data and 
aerial image courtesy of José Guivant and Eduardo Nebot, Australian Centre for Field 
Robotics; results courtesy of Michael Montemerlo, Stanford University. 











Figure 12.15 The path recovered by the SEIF, is correct within +1m. Courtesy of 
Michael Montemerlo, Stanford University. 
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Figure 12.16 Overlay of estimated landmark positions and robot path. Images cour- 
tesy of Michael Montemerlo, Stanford University 


such as corners of moving cars on a nearby highway. The raw odometry, as 
used in our experiments, is poor, resulting in several hundred meters of error 
when used for path integration along the vehicle's 3.5km path. This is illus- 
trated in Figure 12.14, which shows the path of the vehicle. The poor quality 
of the odometry information along with the presence of many spurious fea- 
tures make this dataset particularly amenable for testing SLAM algorithms. 

The path recovered by the SEIF is shown in Figure 12.15. This path is 
quantitatively indistinguishable from the one produced by the EKF. The av- 
erage position error, as measured through differential GPS, is smaller than 
0.50 meters, which is small compared to the overall path length of 3.5 km. 
The corresponding landmark map is shown in Figure 12.16. It, too, is of com- 
parable accuracy as state-of-the-art EKF results. Compared with the EKF, the 
SEIF runs approximately twice as fast and consumes less than a quarter of 
the memory EKF uses. This saving is relatively small, but it a result of the 
small map size, and the fact that most time is spent preprocessing the sensor 
data. For larger maps, the relative savings are larger. 
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Branch-and-Bound Data Association 


SEIFs make it possible to define a radically different data association ap- 
proach, which can be proven to yield the optimal results (although possibly 
in exponential time). The technique is built on three key insights: 


* Just like GraphSLAM, SEIFs make it possible to add soft data association 
constraints. Given two features m; and mj, a soft data association con- 
straint is nothing else but an information link that forces the distance 
between m; and m; to be small. We already encountered examples of 
such soft links in the previous chapter. In sparse extended information 
filters, introducing such a link is a simple, local addition of values in the 
information matrix. 


* We can also easily remove soft association constraints. Just as introducing 
a new constraint amounts to a local addition in the information matrix, 
removing it is nothing else but a local subtraction. Such an "undo" oper- 
ation can be applied to arbitrary data association links, regardless when 
they were added, or when the respective feature was last observed. This 
makes it possible to revise past data association decisions. 


* The ability to freely add and subtract data associations arbitrarily enables 
us to search the tree of possible data associations in a way that is both 
efficient and complete—as will be shown below. 


To develop a branch-and-bound data association algorithm, it shall prove useful 
to consider the data association tree that defines the sequence of data associ- 
ation decisions over time. At each point in time, each observed feature can be 
associated with a number of other features, or considered a new, previously 
unobserved feature. The resulting tree of data association choices, starting at 
time t = 1 all the way to the present time, is illustrated in Figure 12.17a. 
Of course, the tree grows exponentially over time, hence searching it ex- 
haustively is impossible. The incremental greedy algorithm described in the 
previous section, in contrast, follows one path through this tree, defined by 
the locally most likely data associations. Such a path is visualized in Fig- 
ure 12.17a as the thick gray path. 

Obviously, if the incremental greedy approach succeeds, the resulting path 
is optimal. However, the incremental greedy technique may fail. Once 
a wrong choice has been made, the incremental approach cannot recover. 
Moreover, wrong data association decisions introduce errors in the map 
which, subsequently, can induce more errors in the data association. 
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Recursive Search 


The approach discussed in the remainder of this chapter generalizes the in- 
cremental greedy algorithm into a full-blown search algorithm for the tree 
that is provably optimal. Of course, searching all branches in the tree is in- 
tractable. However, if we maintain the log-likelihood of all nodes on the fron- 
tier of the tree expanded thus far, we can guarantee optimality. Figure 12.17b 
illustrates the idea: The branch-and-bound SEIF maintains not just a single 
path through the data association tree, but an entire frontier. Every time a 
node is expanded (e.g., through incremental ML), all alternative outcomes 
are also assessed and the corresponding likelihoods are memorized. This 
is illustrated in Figure 12.17b, which depicts the log-likelihood for an entire 
frontier of the tree. 

Finding the maximum in Equation (12.41) implies that the log-likelihood 
of the chosen leaf is greater or equal to that of any other leaf at the same 
depth. Since the log-likelihood decreases monotonically with the depth of 
the tree, we can guarantee that we have indeed found the optimal data asso- 
ciation values when the log-likelihood of the chosen leaf is greater or equal 
to the log-likelihood of any other node on the frontier. Put differently, when 
a frontier node assumes a log-likelihood greater than the one of the chosen 
leaf, there might be an opportunity to further increase the likelihood of the 
data by revising past data association decisions. Our approach then simply 
expands such frontier nodes. If an expansion reaches a leaf and its value 
is larger than the one of the present best leaf, this leaf is chosen as the new 
data association. Otherwise the search is terminated when the entire frontier 
possesses values that are all smaller or equal to the one of the chosen leaf. 
This approach is guaranteed to always maintain the best set of values for the 
data association variables; however, occasionally it might require substantial 
search. 


Computing Arbitrary Data Association Probabilities 


To test whether or not two features in the map should be linked, we now 
need a technique that, for any two features in the map, calculates the prob- 
ability of equality. This test is essentially the same as GraphSLAM’s corre- 
spondence test, stated in Table 11.8 on page 364. However, in SEIFs this test is 
approximate, in that the exact calculation of the log-likelihood would require 
additional computation. 

Table 12.6 lists an algorithm that tests the probability that two features in 
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Figure 12.17 (a) The data association tree, whose branching factor grows with 
the number of landmarks in the map. (b) The tree-based SEIF maintains the log- 
likelihood for the entire frontier of expanded nodes, enabling it to find alternative 


paths. (c) Improved path. 
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1: Algorithm SEIF correspondence test(Q, £, jj, Mmj, mx): 
2: let B(j) be the blanket of m; 
3: let B(k) be the blanket of mj, 
4: B= B(j) U B(k) 
5: if B(j) n B(k) 2 0 
5: add features along the shortest path between m; and m; to B 
7: endif 
0.0 1 0 0 O.--0 
0.0 0 1 0 O--0 
0...0 0 0 1 0-0 dese 
0---0 1 0 0 0-0 
8 Pg= 0---0 0 10 0-0 
0---0 0 0 1 0-0 
0...0 
0...0 
9: (size (3N + 3) by 3|B|) 
10: “p= (Fg Q fy 
11: pp — Xp Fp£ 
0.0 10 0-0 -10 0…0 
1: Fas=s| 0-0 01 0-0 0 -1 0.0 
“NY NS 一 一 
feature m; feature m; 
13: Xa 一 (FA Q T pus 
14: | uA — XA FAC 
1 
15: return det(2m XA) 2 exp(—l wh XX pa} 











Table 12.6 The SEIF SLAM test for correspondence. 
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the map are one and the same—this test is sufficient to implement the greedy 
data association technique. The key calculation here pertains to the recov- 
ery of a joint covariance and mean vector over a small set of map features 
B. To determine whether two features in the map are identical, SEIFs must 
consider the information links between them. Technically, the more links in- 
cluded in this consideration, the more accurate the result, but at the expense 
of increased computation. In practice, it usually suffices to identify the two 
Markov blankets of the features in question. A Markov blanket of a feature 
is the feature itself, and all other features that are connected via a non-zero 
element in the information matrix. In most cases, both Markov blankets in- 
tersect; if they do not, the algorithm in Table 12.6 identifies a path between 
the landmarks (which must exist if both were observed by the same robot). 

The algorithm in Table 12.6 then proceeds by cutting out a local informa- 
tion matrix and information vector, employing the very same mathemati- 
cal “trick” that led to an efficient sparsification step: SEIFs condition away 
features outside the Markov blankets. As a result, SEIFs obtain an efficient 
technique for calculating the desired probability, one that is approximate (be- 
cause of the conditioning), but works very well in practice. 

This result is interesting in that it not only enables SEIFs to make a data 
association decision, but it provides a way for calculating the log-likelihood 
of such a decision. The logarithm of the result of this procedure corresponds 
to the log-likelihood of this specific data item, and summing those up along 
the path in the data association tree becomes the total data log-likelihood 
under a specific association. 


Equivalence Constraints 


Once two features in the map have determined to be equivalent in the data 
association search, SEIFs add a soft link to the information matrix. Suppose 
the first feature is m; and the second is m;. The soft link constrains their 
position to be equal through the following exponential-quadratic constraint 
exp {—3 (mi — mj)! € (mi — m;)} 


Here C is a diagonal penalty matrix of the type 


oo 0 0 
C = 0 oo O0 
0 0 oo 


In practice, the diagonal elements of C are replaced by large positive values; 
the larger those values, the stronger the constraint. 
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It is easily seen that the non-normalized Gaussian (12.46) can be written 
as a link between m; and m; in the information matrix. Simply define the 
projection matrix 


0.0 100 0.0 —1 00 0-0 
T 0---0 01 0 0.00 -10 00 
mi,—m; = 0---0 001 0.000 -1 0 
— ` 


This matrix maps the state y; to the difference m; — m;. Thus, the expression 
(12.46) becomes 


exp {-$ (Emi; y)” C mim; yr) } 


Thus, to implement this soft constraint, SEIFs have to add eee C Fmi—-m; 
to the information matrix, while leaving the information vector unchanged: 


Qi m Q, ih D m C Fmi-m; 


Clearly, the additive term is sparse: it only contains non-zero off-diagonal 
elements between the features m; and m;. Once a soft link has been added, 
it can be removed by the inverse operation 


Q% — Mh- FL a, C Fmi; 


This removal can occur even regardless of the time that elapsed since a con- 
straint was introduced in the filter. However, careful bookkeeping is nec- 
essary to guarantee that SEIFs never remove a non-existent data associa- 
tion constraint—otherwise the information matrix may no longer be positive 
semidefinite, and the resulting belief might not correspond to a valid proba- 
bility distribution. 


Practical Considerations 


In any competitive implementation of this approach, there will usually only 
exist a small number of data association paths that are plausible at any point 
in time. When closing a loop in an indoor environment, for example, there 
are usually at most three plausible hypotheses: a closure, a continuation on 
the left, and a continuation on the right. But all should quickly become un- 
likely, so the number of times in which the tree is searched recursively should 
be small. 
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(a) 








(b) 











Figure 12.18 (a) Map with incremental ML scan matching and (b) full recursive 
branch-and-bound data association. Images courtesy of Dirk Hahnel, University of 
Freiburg. 


One way to make the data association succeed more often is to incorporate 
negative measurement information. Range sensors, which are brought to bear in 
our implementation, return positive and negative information with regards 
to the presence of objects in the world. The positive information are object 
detections. The negative information applies to the space between the detec- 
tion and the sensor. The fact that the robot failed to detect an object closer 
than its actual reading provides information about the absence of an object 
within the measurement range. 
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Figure 12.19 (a) Log-likelihood of the actual measurement, as a function of time. 
The lower likelihood is caused by the wrong assignment. (b) Log-likelihood, when 
recursively fixing false data association hypotheses through the tree search. The suc- 
cess is manifested by the lack of a distinct dip. 
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Figure 12.20 Example of the tree-based data association technique: (a) When clos- 
ing a large loop, the robot first erroneously assumes the existence of a second, parallel 
hallway. However, this model leads to a gross inconsistency as the robot encounters 
a corridor at a right angle. At this point, the approach recursively searches for im- 
proved data association decisions, arriving on the map shown in diagram (b). 


An approach that evaluates the effect of a new constraint on the overall 
likelihood considers both types of information: positive and negative. Both 
are obtained by calculating the pairwise (mis)match of two scans under their 
pose estimate. When using range scanners, one way to obtain a combination 
of positive and negative information is by superimposing a scan onto a local 
occupancy grid map built by another scan. In doing so, it is straightforward 
to determine an approximate matching probability for two local maps in a 
way that incorporates both the positive and the negative information. 

The remainder of this section highlights practical results achieved using 
SEIFs with tree-based data association. The left panel of Figure 12.18a depicts 
the result of incremental ML data association, which is equivalent to regular 
incremental scan matching. Clearly, certain corridors are represented doubly 
in this map, illustrating the shortcomings of the ML approach. The right 
panel, in comparison, shows the result. Clearly, this map is more accurate 
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Figure 12.21 (a) Path of the robot. (b) Incremental ML (scan matching) (c) Fast- 
SLAM. (d) SEIFs with lazy data association. Image courtesy of Dirk Hahnel, Univer- 
sity of Freiburg. 


than the one generated by the incremental ML approach. 

Figure 12.19a illustrates the log-likelihood of the most recent measurement 
(not the entire path), which drops significantly as the map becomes inconsis- 
tent. At this point, the SEIF engages in searching alternative data association 
values. It quickly finds the "correct" one and produces the map shown in 
Figure 12.18b. The area in question is shown in Figure 12.20, illustrating 
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the moment at which the likelihood takes its dip. The log-likelihood of the 
measurement is shown in Figure 12.19b. 

Finally, Figure 12.21 compares various techniques in the context of map- 
ping a large building with multiple cycles. 


Multi-Robot SLAM 


The SEIF is also applicable to multi-robot SLAM problems. The multi-robot 
SLAM problem involves several robots that independently explore and map 
an environment, with the eventual goal of integrating their maps into a sin- 
gle, monolithic map. In many ways, the multi-robot SLAM problem is rem- 
iniscent of the single-mapping problem, in that data needs to be integrated 
into a single posterior over time. However, the multi-robot problem is sig- 
nificantly more difficult in a number of dimensions: 


* In the absence of prior information on the relative location of two robots, 
the correspondence problem becomes a global problem. In principal, any 
two features in the map may correspond, and only through comparison of 
many features will the robots be able to determine good correspondences. 


* Each map will be acquired in a local coordinate frame, which may differ 
in absolute locations and orientations. Before integrating two maps, they 
have to be oriented and shifted. In SEIFs, this requires a re-linearization 
step of the information matrix and vector. 


* The degree of overlap between the different maps is unknown. For exam- 
ple, the robots might operate at different floors of a building with identical 
floor plans. In such a situation, the ability to differentiate the different 
maps may rely on small environmental features that differ, such as furni- 
ture that might be arranged slightly differently in the different floors. 


In this section, we will only sketch some of the main ideas necessary to im- 
plement an algorithm for multi-robot mapping. We will present an algo- 
rithm for integrating two maps once correspondence has been established. 
We will also discuss, but not prove, techniques for establishing global corre- 
spondence in multi-robot SLAM. 


Integrating Maps 


The critical subroutine for fusing maps under known correspondence is 
shown in Table 12.7. This algorithm accepts as an input two local robot pos- 
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11: 


12: 





Algorithm SEIF_map_fusion(/, £/, OF, £*, d, o, C): 


A — (d, dy a dy dy 0 --: dy dy 0)? 
cosa sina 0 ee 0 


—sina cosa 0 
0 0 1 


cosa sina 0 


—sina cosa 0 


0 ens 0 0 1 
Qi = AMAT 
EE = A (E — (975 A) 
Q* 0 
£k 
for any pair (mj, mp) € C^* do 
0O---0 100 0.0 -100 O0--.0 
F- 0O---0 01.0 0-0 0 -10 O0--.0 
| 00001 0---0 00 -1 0-0 
\ 一 一 一 ——— 
Mj Mk 


return Q, € 





Table 12.7 The map fusion loop in multi-robot mapping with SEIFs. 
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teriors, represented by the information form Q, £ and 0", £*, respectively. 
It also requires three other items 


1. A linear displacement tuple d 
2. A relative rotation angle a 
3. A set of feature correspondences C^^ 


The displacement vector d = (d, dy)’ and rotation a specify the relative 
orientation of the two robots' coordinate systems. In particular, the j-th robot 
pose xÍ and features in the map of the j-th robot are mapped into the k-th 
robot's coordinate frame through a rotation by a followed by a translation 
by d. Here we use “j — k” to denote the coordinates of an item in the j-th 
robot's map represented in the k-th robot coordinate frame. 


1. For the j-th robot pose 2? 


ZI 一 dy cosa sina 0 aj 
yk = d, | + | -sna cosa 0 y 
gi—k Q 0 0 1 93 
—— 
gi^* ad 


2. For each feature in the j-th robot’s map ml 


Ti dy cosa sina 0 m; , 
mij" = d, | - | -sina cosa 0 mi, 
L1 0 0 0 1 m] 。 
—— 
mi-^F mj 


i 


These two mappings are performed in lines 2 through 5 of the algorithm 
SEIF map fusion in Table 12.7. This step involves a local rotation and shift 
of the information matrix and the information vector, which preserved the 
sparseness of the SEIF. Afterwards, the map fusion proceeds by building a 
single joint posterior map, in lines 6 and 7. The final step in the fusion al- 
gorithm pertains to the correspondence list C^*. This set consists of pairs 
(mj, m.) of features that mutually correspond in the maps of robot j and 
robot k. The fusion is performed analogously to the soft equivalence con- 
straints considered in Chapter 12.9.3. Specifically, for any two correspond- 
ing features, we simply add large terms into the information matrix at the 
elements linking these two features. 
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We note that an alternative way to implement the map fusing step collapses 
the corresponding rows and columns of the resulting information matrix and 
vector. The following example illustrates the operation of collapsing feature 
2 and 4 in the filter, which would occur when our correspondence list states 
that feature 2 and 4 are identical: 


O11. O12. Qiz Qu 





Qui 124-014 Qs 

Q Q Q23 Q 
2r 2 23 21 — | Q2:-O41. Q22+942+9Q24+944 《223 十 (243 
Q31 O32. Q33 Osa eon Q--Q34 Q5 
Qai 42 Q43 O44 i ! Í - 

& & 

$2 — E2+64 

E3 &3 

£4 


Collapsing the information state exploits the additivity of the information 
state. 


Mathematical Derivation of Map Integration 


For the derivation, it shall prove successful to give the rotation matrix and 
the shift vectors in (12.52) and (12.53) names. Let us define the variables 6,, 
dm, and A as follows: 


ôs = (dd a)" 
Ôm = (d; dy 0)” 
| cosa sina 0 
AtS —sina cosa 0 
0 0 1 


We can then rewrite (12.52) and (12.53) as 


jk j 
T3 bn Am, 


quis ôm + Ami 


2 


For the full state vector we now obtain 
Wo = A+A 
with 


A = (dp Om, Ôm fet ôm)” 
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A, 0 0 
0 Am 0 
A = : 


The coordinate transformations needed in information space are similar. To 
see, let the posterior of the j-th robot at time t be defined through the infor- 
mation matrix Q/ and the information vector £/. The following transforma- 
tion applies the shift and rotation: 





n expí— gp ORT QI>k ijs q I IST. gj n 
(A+ Ay) Q7 (A A y!) (AA yi e^) 


Il 
3 
i?) 
tal 
iS 
一 一 
| 


= mexp4-i y? AT QJ^F* Ay p iT RISE A — 4AT QUON A 
N _ 
const. 


—— 
const. 


; RM 


br 


Q3 £j 
Thus, we have 
(ue = AE A 
& = (QIE A+ AT gi^*) 


From A^! = AT, it follows that 


WF = AMAT 
g^* = AE- A) 
This proves the correctness of lines 2 through 7 in Table 12.7. The remain- 


ing soft equality constraints follow directly from the deliberations in Chap- 
ter 12.9.3. 
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Establishing Correspondence 


The remaining problem pertains to establishing correspondence between dif- 
ferent maps, and calculating the rotation a and translation 6. There exists a 
myriad of possible approaches, hence we will only sketch one possible algo- 
rithm here. Clearly, the problem lies in the large number of features that can 
potentially be matched in both local maps. 

A canonical algorithm for landmark-based maps might seek to cache away 
local configurations of sufficiently nearby landmarks so that a comparison of 
such local configurations yields good candidates for correspondences. For 
example, one might identify sets of m nearby landmarks (for a small number 
m), and calculate relative distances or angles between them. Such a vector of 
distances or angles will then serve as statistics that can be used to compare 
two maps. Using hash tables or kd-trees, they may be efficiently accessi- 
ble, so that a query “do the following m landmarks in the j-th robot’s map 
correspond to any m landmarks in the map of robot k?” can be answered ef- 
ficiently, at least in approximation. Once an initial correspondence has been 
identified, we can easily calculate d and a by minimizing the quadratic dis- 
tance between these m features in both maps. 

The fusion then proceeds as follows: First, the fusion operator is called 
using the d, a, and CI computed from those m local features in both maps. 
Subsequently, additional landmarks are identified for which our correspon- 
dence test in Table 12.6 generates a probability that falls below a threshold. 
A simple termination may occur when no such pairs of landmarks can be 
found. 

A comparison of both components of the unified maps—and specifically 
of nearby landmarks that are not in correspondence—will then provide a 
criterion for acceptance of the resulting match. Formally, once the search 
has terminated, a fusion is accepted if the resulting reduction of the overall 
likelihood (in logarithmic form) is offset by the number of collapsed features 
times a constant; this effectively implements a Bayesian MAP estimator with 
an exponential prior over the number of features in the world. 

In general, we note that the search for the optimal correspondence is NP- 
hard. However, hill climbing tends to work extremely well in practice. 


Example 


Figure 12.22 shows an example of eight local maps. These maps are obtained 
by partitioning our benchmark data set discussed previously, into 8 disjoint 
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Figure 12.22 Eight local maps obtained by splitting the data into eight sequences. 

















Figure 12.23 A multi-robot SLAM result, obtained using the algorithm described in 
this chapter. Image courtesy of Yufeng Liu. 
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Step t = 3 


Step t — 62 











Step t — 65 


Step t — 85 
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Figure 12.24 Snapshots from our multi-robot SLAM simulation at different points 
in time. During Steps 62 through 64, vehicle 1 and 2 traverse the same area for the 
first time; as a result, the uncertainty in their local maps shrinks. Later, in steps 85 
through 89, vehicle 2 observes the same landmarks as vehicle 3, with a similar effect 
on the overall uncertainty. After 500 steps, all landmarks are accurately localized. 


srjs.cn 000000 


432 


12.12 


12 The Sparse Extended Information Filter 


subsequences, and running SEIF on each one of those in separation. 

By combining those local maps using m = 4 local features in a hash table 
for the correspondence search, the SEIF arrives reliably at the map shown 
in Figure 12.23. This map, when calculated through u = Q-1£, is not just 
the superposition of the individual local maps that participated. Instead, 
each local map is slightly bent in the process, which is a result of additively 
combining the information forms. 

Figure 12.24 shows a simulation of three air vehicles. The diagram illus- 
trates that through fusing maps, the uncertainty in each individual map is 
reduced. 


Summary 


This chapter has described an efficient solution to the online SLAM problem: 
the sparse extended information filter, of SEIF. The SEIF is similar to Graph- 
SLAM in that it represents the posterior in information form. However it 
differs in that past poses are integrated out, which results in an online SLAM 
algorithm. We learned: 


* When integrating out past poses, features that were observed from those 
poses become linked directly in the information matrix. 


* The information matrix tends to be dominated by a small number of 
between-features links that are found between physically nearby features. 
The further two features separated, the weaker their link. 


e By sparsifying the matrix, which amounts to shifting information through 
the SEIF in a way that reduces the number of links, the information matrix 
will remain sparse at all times. Sparseness implies that every element in 
this matrix is only linked through a non-zero information value to finitely 
many other elements, regardless of the total map size N. However, spar- 
sification is an approximation, and not an exact operation. 


* We observed that for sparse information matrix, both essential filtering 
steps can be carried out in time independent of the map size: the mea- 
surement step and the motion update step. In regular information filters, 
only the measurement update step requires constant time. The motion 
update step requires more time. 


* For a number of steps, the SEIF still requires a state estimate. The SEIF 
uses an amortized algorithm for recovering these estimates. 


»js.on 000000 


12.12 Summary 433 


* We discussed two techniques for data association. The first is identical to 
the one discussed for EKF SLAM: Incremental maximum likelihood. This 
technique associates measurements to the most likely one at each point in 
time, but never revises a correspondence decision. 


* An improved technique recursively searches the tree of all data associa- 
tions, so as to arrive at a data association vector that maximizes the like- 
lihood of all data associations together. It does this based on an online 
version of branch-and-bound. This techniques uses a lazy tree expansion 
technique, in which the log-likelihood values of the data are remembered 
along the fringe of a partially expanded tree. When the present best leaf 
arrives at a value that is inferior to a partially expanded value at the fringe, 
the fringe is expanded until it either becomes inferior itself or a better 
global solution is found to the data association problem. 


* We also discussed the use of SEIF in the context of multi-robot mapping. 
The algorithm uses as an inner loop a technique for rotating and shift- 
ing maps represented in information form, without ever computing the 
underlying map itself. This operation maintains the sparseness of the in- 
formation matrix. 


* An algorithm was sketched that makes it possible to efficiently carry out 
the global correspondence between two maps in the multi-robot map- 
ping problem. This algorithm hashes away local feature configurations 
and uses fast search techniques to establish correspondence. It then fuses 
maps recursively, and accepts a merge if the resulting maps fit well. 


The SEIF is our first efficient online SLAM algorithm in this book. It marries 
the elegance of the information representation with the idea of integrating 
out past poses. It is the "lazy" sister of the EKF: Whereas EKFs proactively 
spread the information of each new measurement through the network of 
features so as to calculate a correct joint covariance, the SEIF merely accu- 
mulates this information, and resolves it slowly over time. The tree-based 
data association in SEIF is also lazy: It only considered alternative paths to 
the best known one when necessary. This will be in contrast to the technique 
described in the coming chapter, which applies particle filters to the problem 
of data association. 

To attain efficient online, however, the SEIF has to make a number of ap- 
proximations, which make its result less accurate than that of GraphSLAM 
or the EKF. In particular, the SEIF has two limitations: First, it linearizes only 
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once, just like the EKF. GraphSLAM can re-linearize, which generally im- 
proves the accuracy of the result. Second, the SEIF uses an approximation 
step to maintain sparsity of its information matrix. This sparsity was natu- 
rally given for the GraphSLAM algorithm, by nature of the information that 
was being integrated. 

While each of the basic SEIF steps (with known correspondence) can be 
implemented in “constant time,” a final note of caution is in order. If SEIFs 
were applied to a linear system (meaning, we do not need Taylor series ap- 
proximations, and the data association is known), the update would be truly 
constant time. However, because of the need to linearize, we need an esti- 
mate of the mean j; along with the information state. This estimate is not 
maintained in the traditional information filter, and recovering it requires a 
certain amount of time. Our SEIF implementation only approximates it, and 
the quality of the posterior estimate depends on the quality of this approxi- 
mation. 


Bibliographical Remarks 


The literature on information-theoretic representations in SLAM was already discussed in the 
previous chapter, insofar it pertains to offline optimization. Information filters have a relatively 
young history in the SLAM field. In 1997, Csorba developed an information filter that main- 
tained relative information between triplets of three landmarks. He was possibly the first to 
observe that such information links maintained global correlation information implicitly, paving 
the way for algorithms with quadratic to linear memory requirements. Newman (2000); New- 
man and Durrant-Whyte (2001) developed a similar information filter, but left open the ques- 
tion how the landmark-landmark information links are actually acquired. Under the ambitious 
name “consistent, convergent, and constant-time SLAM,” Leonard and Newman further devel- 
oped this approach into an efficient alignment algorithm, which was successfully applied to an 
autonomous underwater vehicle using synthetic aperture sonar (Newman and Rikoski 2003). 
Another seminal algorithm in the field is Paskin’s (2003) thin junction filter algorithm, which 
represents the SLAM posterior in a sparse network known as thin junction trees (Pearl 1988; 
Cowell et al. 1999). The same idea was exploited by Frese (2004), who developed a similar tree 
factorization of the information matrix for efficient inference. Julier and Uhlmann developed 
a scalable technique called covariance intersection, which sparsely approximates the posterior in 
a way that provably prevents overconfidence. Their algorithm was successfully implemented 
on NASA's MARS Rover fleet (Uhlmann et al. 1999). The information filter perspective is also 
related to early work by Bulata and Devy (1996), whose approach acquired landmark models 
first in local landmark-centric reference frames, and only later assembles a consistent global map 
by resolving the relative information between landmarks. Finally, certain “offline” SLAM algo- 
rithms that solve the full SLAM problem, such as the ones by Bosse et al. (2004), Gutmann and 
Konolige (2000), Frese (2004), and Montemerlo and Thrun (2004), have shown to be fast enough 
to run online on limited-sized data sets. 

Multi-robot map merging is discussed in Gutmann and Konolige (2000). Nettleton et al. 
(2003) was the first to extended the information representation to multi-robot SLAM problems. 
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They realized that the additivity of information enabled the asynchronous integration of local 
maps across vehicles. They also realized that the addition of submaps led to efficient communi- 
cation algorithms, whereby the integration of maps would be possible in time logarithmic in the 
number of vehicles involved. However, they left open as to how to align such maps, a problem 
later addressed by Thrun and Liu (2003). 

The SEIF algorithm was developed by Thrun et al. (2002); see also Thrun et al. (2004a). To 
our knowledge, it is the first algorithm that derives the creation of information links between 
pairs of features from a filtering perspective. A greedy data association algorithm for SEIFs was 
developed by Liu and Thrun (2003), which was subsequently extended to multi-robot SLAM 
by Thrun and Liu (2003). The branch-and-bound data association search is due to Hahnel et al. 
(2003a), based on earlier branch-and-bound methods by Lawler and Wood (1966) and Narendra 
and Fukunaga (1977). It parallels work by Kuipers et al. (2004), who developed a similar data 
association technique, albeit not in the context of information theoretic concepts. SEIFs were 
applied to mapping problems of abandoned mines (Thrun et al. 2004c), involving maps with 
108 features. 

The Victoria Park dataset referenced in this chapter is due to Guivant et al. (2000). 


Exercises 


1. Compare the sparseness in GraphSLAM with the sparseness in SEIFs: 
what are the advantages and disadvantages of each? Provide conditions 
under which either would be clearly preferable. The more concise your 
reasoning, the better. 


2. An important concept for many SLAM researchers is consistency. The 
SLAM community defines consistency a bit different from the general 
field of statistics (in which consistency is an asymptotic property). 


Let x be a random vector, and N (u, X) be a Gaussian estimate of x. The 
Gaussian is said to be consistent if it meets the following two properties: 


Condition 1: Unbiasedness: The mean s is an unbiased estimator of x: 
El] = z 


Condition 2: No Overconfidence: The covariance X is not overconfident. 
Let = be the true the covariance of the estimator y: 


E = E((u-Elu) (u— Eu] 

Then X overconfident if there exists a vector z for which 

gx wo > z mg 

Overconfidence implies that the 95% confidence ellipse of the esti- 


mated covariance © falls inside or intersects with the true confidence 
ellipse of the estimator. 
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Proving consistency is generally difficult for SLAM algorithms. Here we 
want you to prove or disprove that sparsification retains consistency (see 
Equation (12.20)). In particular, prove or disprove the following conjec- 
ture: Given a consistent joint p(a, b, c) in Gaussian form, the following approxi- 
mation will also be consistent: 


pla, c) p(b. c) 


E p(c) 


. You are asked to implement the SEIF algorithm for linear Gaussian SLAM. 


In linear Gaussian SLAM, the motion equation is of the simple additive 


type 

xp ~ N (4 + ut, R) 

and the measurement equation is of the type 
z = N(mj- t, Q) 


where R and Q are diagonal covariances matrices. The data associations 
are known in linear Gaussian SLAM. 


(a) Run it on simple simulations and verify the correctness of your imple- 
mentation. 


(b) Graph the error of the SEIF as a function of the sparseness of the infor- 
mation matrix. What can you learn? 


(c) Graph the computation time of your SEIF implementation as a function 


of the sparseness of the information matrix. Report any interesting 
findings. 


4. The sparsification rule in SEIFs conditions away all passive features m^, 


by assuming m^ — 0. Why is this done? What would be the update 
equation if these features would not be conditioned away? Would the 
result be more accurate or less accurate? Would the computation be more 
or less efficient? Be concise. 


. At present, SEIFs linearize as soon as a measurement or a motion com- 


mand is integrated into the filter. Brainstorm about a SEIF algorithm that 
allows for retro-actively altering the linearization. How would the poste- 
rior of such an algorithm be represented? What would be the representa- 
tion of the information matrix? 
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The FastSLAM Algorithm 


We will now turn our attention to the particle filter approach to SLAM. We al- 
ready encountered particle filters in several chapters of this book. We noted 
that particle filters are at the core of some of the most effective robotics algo- 
rithms. This raises the question as to whether particle filters are applicable to 
the SLAM problem. Unfortunately, particle filters are subject to the curse of 
dimensionality: whereas Gaussians scale between linearly and quadratically 
with the number of dimensions of the estimation problem, particle filters 
scale exponentially! A straightforward implementation of particle filters for 
the SLAM problem would be doomed to fail, due to the large number of 
variables involved in describing a map. 

The algorithm in this chapter is based on an important characteristic of 
the SLAM problem, which has not yet been explicitly discussed in this book. 
Specifically, the full SLAM problem with known correspondences possesses 
a conditional independence between any two disjoint sets of features in the 
map, given the robot pose. Put differently, if an oracle told us the true robot 
path, we could estimate the location of all features independently of each 
other. Dependencies in these estimates arise only through robot pose uncer- 
tainty. This structural observation will make it possible to apply a version 
of particle filters to SLAM known as Rao-Blackwellized particle filters. Rao- 
Blackwellized particle filters use particles to represent the posterior over 
some variables, along with Gaussians (or some other parametric PDF) to rep- 
resent all other variables. 

FastSLAM uses particle filters for estimating the robot path. As we shall 
see, for each of these particles the individual map errors are conditionally 
independent. Hence the mapping problem can be factored into many sepa- 
rate problems, one for each feature in the map. FastSLAM estimates these 
map feature locations by EKFs, but using a separate low-dimensional EKF 


5rjs.cn 000000 


438 


13 The FastSLAM Algorithm 


for each individual feature. This is fundamentally different from SLAM al- 
gorithms discussed in previous chapters, which all use a single Gaussian to 
estimate the location of all features jointly. 

The basic algorithm can be implemented in time logarithmic in the number 
of features. Hence, FastSLAM offers computational advantages over plain 
EKF implementations and many of its descendants. The key advantage of 
FastSLAM, however, stems from the fact that data association decisions can 
be made on a per-particle basis. As a result, the filter maintains posteriors 
over multiple data associations, not just the most likely one. This is in stark 
contrast to all SLAM algorithms discussed so far, which track only a single 
data association at any point in time. In fact, by sampling over data asso- 
ciations, FastSLAM approximates the full posterior, not just the maximum 
likelihood data association. As shown empirically, the ability to pursue mul- 
tiple data associations simultaneously makes FastSLAM significantly more 
robust to data association problems than algorithms based on incremental 
maximum likelihood data association. 

Another advantage of FastSLAM over other SLAM algorithms arises from 
the fact that particle filters can cope with non-linear robot motion models, 
whereas previous techniques approximate such models via linear functions. 
This is important when the kinematics are highly non-linear, or when the 
pose uncertainty is relatively high. 

The use of particle filters creates the unusual situation that FastSLAM 
solves both the full SLAM problem and the online SLAM problem. As we shall 
see, FastSLAM is formulated to calculate the full path posterior—only the 
full path renders feature locations conditionally independent. However, be- 
cause particle filters estimate one pose at-a-time, FastSLAM is indeed an on- 
line algorithm. Hence it also solves the online SLAM problem. Among all 
SLAM algorithms discussed thus far, FastSLAM is the only algorithm that 
fits both categories. 

This chapter describes several instantiations of the FastSLAM algorithm. 
FastSLAM 1.0 is the original FastSLAM algorithm, which is conceptually 
simple and easy to implement. In certain situations, however, the particle 
filter component of FastSLAM 1.0 generates samples inefficiently. The algo- 
rithm FastSLAM 2.0 overcomes this problem through an improved proposal 
distribution, but at the expense of an implementation that is significantly 
more involved (as is the mathematical derivation). Both of these FastSLAM 
algorithms assume the feature-based sensor model discussed earlier. The ap- 
plication of FastSLAM to range sensors results in an algorithm that solves the 
SLAM problem in the context of occupancy grid maps. For all algorithms, 
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this chapter provides techniques for estimating data association variables. 


The Basic Algorithm 


Particles in the basic FastSLAM algorithm are of the form shown in Ta- 
ble 13.1. Each particle contains an estimated robot pose, denoted z", and 
a set of Kalman filters with mean ult | and covariance xr l, one for each fea- 
ture m; in the map. Here [k] is the index of the particle. As usual, the total 
number of particles is denoted M. 

The basic FastSLAM update step is depicted in Table 13.2. Barring the 
many details of the update step, the main loop is in large parts identical 
to the particle filter, as discussed in Chapter 4 of this book. The initial step 
involves the retrieval of a particle representing the posterior at time t — 1, and 
the sampling of a robot pose for time t using the probabilistic motion model. 
The step that follows updates the EKFs for the observed features, using the 
standard EKF update equation. This update is not part of the vanilla particle 
filter, but it is necessary in FastSLAM to learn a map. The final steps are 
concerned with the calculation of an importance weight, which are then used 
to resample the particles. 

We will now investigate each of these steps in more detail and derive them 
from the basic mathematical properties of the SLAM problem. We note that 
our derivation presupposes that FastSLAM solves the full SLAM problem, 
not the online problem. However, as shall become apparent further below in 
this chapter, FastSLAM is a solution to both of these problems: Each particle 
can be thought of as a sample in path space as required for the full SLAM 
problem, but the update only requires the most recent pose. For this reason, 
FastSLAM can be run just like a filter. 


Factoring the SLAM Posterior 


FastSLAM's key mathematical insight pertains to the fact that the full SLAM 
posterior, p(yis | 214,14) in Equation (10.2) can be written in the factored 
form 
N 

P(Yrt | zit uns Cit) = pria | Zit, wie crt) LH». | zi: Z1: Crt) 

n=1 
This factorization states that the calculation of the posterior over paths and 
maps can be decomposed into N + 1 probabilities. 
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Figure 13.1 Particles in FastSLAM are composed of a path estimate and a set of 
estimators of individual feature locations with associated covariances. 





* Dothe following M times: 


- Retrieval. Retrieve a pose all from the particle set Y;. ,. 


—- Prediction. Sample a new pose E ~ p(a | ltl ux). 


- Measurement update. For each observed feature 2; identify the 
correspondence j for the measurement zi, and incorporate the 
measurement z; into the corresponding EKF, by updating the mean 
ul and covariance XD 

- Importance weight. Calculate the importance weight w^! for the 
new particle. 


* Resampling. Sample, with replacement, M particles, where each 
particle is sampled with a probability proportional to wil, 











Figure 13.2 The basic steps of the FastSLAM algorithm. 
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Figure 13.3 TheSLAM problem depicted as Bayes network graph. The robot moves 
from pose z;-1 to pose x;45, driven by a sequence of controls. At each pose 2; it 
observes a nearby feature in the map m = (mi1,m»a, ma). This graphical network 
illustrates that the pose variables "separate" the individual features in the map from 
each other. If the poses are known, there remains no other path involving variables 
whose value is not known, between any two features in the map. This lack of a path 


renders the posterior of any two features in the map conditionally independent (given 
the poses). 


FastSLAM uses a particle filter to compute the posterior over robot paths, 
denoted p(zi« | 214, u14, C14). For each feature in the map, FastSLAM now 
uses a separate estimator over its location p(m;, | 21.4, cl:t) 21:4) one for each 
n = l,...,N. Thus, in total there are N + 1 posteriors in FastSLAM. The 
feature estimators are conditioned on the robot path, which means we will 
have a separate copy of each feature estimator, one for each particle. With M 
particles, the number of filters will actually be 1+ MN. The product of these 
probabilities represents the desired posterior in a factored way. As we shall 
show below, this factored representation is exact, not just an approximation. 
Itis a generic characteristic of the SLAM problem. 

To illustrate the correctness of this factorization, Figure 13.3 depicts the 
data acquisition process graphically, in the form of a dynamic Bayesian net- 
work. As this graph suggests, each measurement 21,..., z; is a functions of 
the position of the corresponding feature, along with the robot pose at the 
time the measurement was taken. Knowledge of the robot path separates 
the individual feature estimation problems and renders them independent 
of one another, in the sense that no direct path exists in this graphical de- 
piction from one feature to another that would not involve variables on the 
robot's path. Knowledge of the exact location of one feature will therefore tell 
us nothing about the locations of other features, if the robot path is known. 
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This implies that features are conditionally independent given the robot path, 
as stated in Equation (13.1). 

Before we discuss the implications of this property to the SLAM problem, 
let us briefly derive it mathematically. 


Mathematical Derivation of the Factored SLAM Posterior 
We will now derive Equation (13.1) from first principles. Clearly, we have 
P(Yr: | Z1:t; Urt C14) = DZ | Zit, Urt, cia) PIM | 14, Zi:t, C1:t) 


It therefore suffices to show that the second term on the right-hand side fac- 
tors as follows: 
N 

P(M | Eit, Crt; Zt) = II P(Mn | 214; Crt, 214) 

n=1 
We will prove this by mathematical induction. Our derivation requires the 
distinction of two possible cases, depending on whether or not the feature 
Mn Was observed in the most recent measurement. In particular, if c; 4 n, 
the most recent measurement z; has no effect on the posterior, and neither 
has the robot pose x; or the correspondence c;. Thus, we obtain: 


P(Mn | Fr, Cits Zit) = p(ma | 1421, 014-1, 214-1) 
If c, = n and hence m,, = me, was observed by the most recent measurement 
zı, the situation calls for applying Bayes rule, followed by some standard 
simplifications: 
P(t | ma Liit, Crit, Z1t—-1) p(me | zs cue 2c) 
P(t | 314; Crit, 214-1) 

P(t | Tt, mo; Ct) p( mo, | xi4—i, C1:t—1, 214-1) 

p(z | Lit, Crt, 21:41) 
This gives us the following expression for the probability of the observed 
feature m,,: 





p(m., | Liit, Cit, Z1) m 





p(me, | zi; clt Zt) DC | zie eue Z1:t—1) 
(2 | Lt, Me, Ct) 

The proof of the correctness of (13.3) is now carried out by induction. Let us 

assume that the posterior at time t — 1 is already factored: 





p(m., | Z14—1, C1:t—1, 214-1) = 


N 


p(m | T1:t—1; Clt-1; Ž1:t—1) = II p(m, | 214—1; Cl:t—1; Z1:t-1) 
n=1 
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This statement is trivially true for t = 1, since in the beginning of time the 
robot has no knowledge about any feature; hence all estimates must be inde- 
pendent. At time t, the posterior is of the following form: 


p(zi | m, 23:4, Ciit) Z1:t—1) p(m | Tiits Cie) Z1:t—1) 
plz | 21:5 Clits Z1:t-1) 
p(Ze | Ce, mo, Ct) p(m | 21: 1, 14-1, 214-1) 


plz | 1:t) C1:t; Zi:t—1) 





p(m | T1:5C1l:t Z4) 





Plugging in our inductive hypothesis (13.7) gives us: 


p(m | T1: Cl; Z1:t) 


N 

_ p(z | Lt, Mes, Ct) 

T II P(Mn | L1t—1; Clt—1; Ž1:t—1) 
plz | 21:45 Clits Z1:—1) 253 





p(zi | 24, Me, , Ct) 
= D(me | dad cia - 159041) 
plz | T1:t, Ĉ1:t; 214-1) —————— 





Eq. (13.6) 


II p(mn | T1:t-1) C1:t-1; 21-1) 
一 


NEC Eq. (13.4) 


= p(m., | 21:5 Ciit; Zi) II P(Mn | 21:5 Ciit; Z1:t) 
nci 
N 
= LH». | Tiit, Clit, 714) 


n=1 


Notice that we have substituted Equations (13.4) and (13.6) as indicated. This 
shows the correctness of Equation (13.3). The correctness of the main form 
(13.1) follows now directly from this result and the following generic trans- 
formation: 


Pye | Zit, Ure, Crt) = pua | zue was 01a) p(m | za, zi ua, Crt) 
p(zia | Zits Uit; Cit) p(m | List, Clit; Z1:t) 
N 


= plist | Zit Urt crt) [| plmn | zu cxt, 212) 
n=1 


We note that conditioning on the entire path x1., is indeed essential for this 
result. Conditioning on the most recent pose x; would be insufficient as con- 
ditioning variable, as dependencies may arise through previous poses. 
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Figure 13.4 Samples drawn from the probabilistic motion model. 


FastSLAM with Known Data Association 


The factorial nature of the posterior provides us with significant computa- 
tional advantages over SLAM algorithms that estimate an unstructured pos- 
terior distribution. FastSLAM exploits the factored representation by main- 
taining MN + 1 filters, M for each factor in (13.1). By doing so, all MN +1 
filters are low-dimensional. 

As noted, FastSLAM estimates the path posterior using a particle filter. 
The map feature locations are estimated using EKFs. Because of our fac- 
torization, FastSLAM can maintain a separate EKF for each feature—which 
makes the update more efficient than in EKF SLAM. Each individual EKF is 
conditioned on a robot path. Hence, each particle possesses its own set of 
EKFs. In total there are NM EKFs, one for each feature in the map and one 
for each particle in the particle filter. 

Let us begin with the FastSLAM algorithm in the case of known data asso- 
ciation. Particles in FastSLAM will be denoted 


k k k k k k 
yl ] = (al ]， ull, a iaia TUR M 


As usual, the bracketed notation [k] indicates the index of the particle; oll is 
the path estimate of the robot, and ut l and zh ] are the mean and variance 
of the Gaussian representing the n-th feature location, relative to the k-th 
particle. Together, all these quantities form the k-th particle Y], of which 
there are a total of M in the FastSLAM posterior. 

Filtering, or calculating the posterior at time t from the one at time t — 1, 
involves generating a new particle set Y, from Y;. ,, the particle set one time 
step earlier. This new particle set incorporates a new control u; and a mea- 
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surement z with associated correspondence c;. This update is performed in 
the following steps: 


1. Extending the path posterior by sampling new poses. FastSLAM 1.0 
uses the control u; to sample new robot pose z; for each particle in Y; . 
More specifically, consider the k-the particle YH. FastSLAM 1.0 samples 
the pose x; in accordance with this k-th particle, by drawing a sample 
according to the motion posterior 


z s play | all, us) 


Here zz 的， is the posterior estimate for the robot location at time t — 1, 
residing in the k-th particle. The resulting sample z 的 is then added to a 
temporary set of particles, along with the path of previous poses, ale. 
The sampling step is graphically depicted in Figure 13.4, which illustrates 
a set of pose particles drawn from a single initial pose. 


2. Updating the observed feature estimate. Next, FastSLAM 1.0 updates 


the posterior over the feature estimates, represented by the mean T ] ， 


and the covariance xl TN The updated values are then added to the 
temporary particle set, along with the new pose. 


The exact update equation depends on whether or not a feature m, was 
observed at time t. For n £ c, we did not observe feature n, we already 
established in Equation (13.4) that the posterior over the feature remains 
unchanged. This implies the simple update: 


k k k k 
Qum XM s CR xd 


For the observed feature n — c;, the update is specified through Equation 
(13.5), restated here with the normalizer denoted by 77: 


p(m., | U1:ty Z1: C14) 


= n p(z | Lt, Midis Ct) p(me, | L115 Ž1:t—1; CLt-1) 


The probability p(me, | 214-1, C1:t—1, 214-1) at time t — 1 is represented 
by a Gaussian with mean hi 1.1 and covariance zi I For the new esti- 


mate at time t to be Gaussian as well, FastSLAM linearizes the perceptual 
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model p(z | £t, m, ci) in the same way as EKF SLAM. As usual, we 
approximate the measurement function h by Taylor expansion: 


k k k k] [k k 
hma, a! ]) 2x ligul, oue 1) + h'(al lul, (mea z He) 
一 一 一 -一 
=; all = HI! 
alk k k 
= A+ Af (me, — iD.) 


Here the derivative h’ is taken with respect to the feature coordinates me, . 
This linear approximation is tangent to h at oll and phl. Under this 
approximation, the posterior for the location of feature c; is indeed Gaus- 
sian. The new mean and covariance are obtained using the standard EKF 
measurement update: 


k k k]T k k k]T = 
KP = YO HE (He Ben He Q 
Mee = Moyea t+ KE a) 

k k k k 
EX. = Q- KPHPYED,, 


Steps 1 and 2 are repeated M times, resulting in a temporary set of M 
particles. 


. Resampling. In a final step, FastSLAM resamples this set of particles. We 


already encountered resampling in a number of algorithms. FastSLAM 
draws from its temporary set M particles (with replacement) according to 
a yet-to-be-defined importance weight. The resulting set of M particles 
then forms the new and final particle set, Y;. The necessity to resample 
arises from the fact that the particles in the temporary set are not dis- 
tributed according to the desired posterior: Step 1 generates poses Zi only 
in accordance with the most recent control u;, paying no attention to the 
measurement z;. As the reader should know well by now, resampling is 
the common technique in particle filtering to correct for such mismatches. 


This situation is once again illustrated in Figure 13.5, for a 1-D example. 
Here the dashed line symbolizes the proposal distribution, which is the dis- 
tribution at which particles are generated, and the solid line is the target 
distribution. In FastSLAM, the proposal distribution does not depend on 
Zt, but the target distribution does. By weighing particles as shown in 
the bottom of this figure, and resampling according to those weights, the 
resulting particle set indeed approximates the target distribution. 
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Proposal 一 一 = 和/ 
/ 
Target 


Samples from 
proposal distribution 
Weighted samples 


Figure 13.5 Samples cannot be drawn conveniently from the target distribution 
(shown as a solid line). Instead, the importance sampler draws samples from the pro- 
posal distribution (dashed line), which has a simpler form. Below, samples drawn 
from the proposal distribution are drawn with lengths proportional to their impor- 
tance weights. 


To determine the importance factor, it will prove useful to calculate the 
actual proposal distribution of the path particles in the temporary set. 
Under the assumption that the set of path particles in Y;. ; is distributed 
according to p(Z1 1 | 214-1, 1451; C14—1) (which is an asymptotically 
correct approximation), path particles in the temporary set are distributed 
according to: 





k 
p(z | Zi:t—1; Ut; C1it—1) = plaf! | chl ue) plal | Z1:t—1; Ul:t—1; C1:t—1) 


The factor plal! | zl, ut) is the sampling distribution used in Equation 
(13.12). 


The target distribution takes into account the measurement at time z;, along 
with the correspondence ct: 


k 
plal | Zi:t; U1:t» C1:t) 


The resampling process accounts for the difference of the target and the 
proposal distribution. As usual, the importance factor for resampling is 
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given by the quotient of the target and the proposal distribution: 





(jj _ target distribution 
t proposal distribution 


k 
plait | Zi, Ut; C1it) 





k 
plal | Mite Chast) 
k 
= nz | atc) 


The last transformation is a direct consequence of the following transfor- 
mation of the enumerator in (13.21): 


plaf | 21:05 01:5 C1:t) 


= n p(z | all, ziti, Urt, Crt) plc” | Z1:t-15 01:5 C1:t) 


n pz | a! c) plat | Z1:t—1, U1:t, C1:t—1) 


To calculate the probability p(z; | z , c+) in (13.21), it will be necessary 
to transform it further. In particular, this probability is equivalent to the 
following integration, where we once again omit variables irrelevant to 
the prediction of sensor measurements: 


w = m / p(zi | ma, al, cz) p(me, | £, e) dme 


k k 
= mn [ve | Ma, x! l cy) p(m., [als Gig queen) dme, 
一 一 


k k 
e Adan m) 


Here N (x; u, =) denotes a Gaussian distribution over the variable x with 
mean u and covariance X. 


The integration in (13.23) involves the estimate of the observed feature 
location at time t and the measurement model. To calculate (13.23) in 
closed form, FastSLAM employs the very same linear approximation used 
in the measurement update in Step 2. In particular, the importance factor 
is given by 


1 i - n 
wf! mn 2RQM 3 exo[- 36 - QF - 4) 


with the covariance 


k]T lk] 


k k 
pos HPUEA HO EQ. 
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(a) (b) 




















Figure 13.6 Mismatch between proposal and posterior distributions: (a) illustrates 
the forward samples generated by FastSLAM 1.0, and the posterior induced by the 
measurement (ellipse). Diagram (b) shows the sample set after the resampling step. 


This expression is the probability of the actual measurement z, under the 
Gaussian. It results from the convolution of the distributions in (13.23), ex- 
ploiting our linear approximation of h. The resulting importance weights 
are used to draw with replacement M new samples from the temporary 
sample set. Through this resampling process, particles survive in propor- 
tion of their measurement probability. 


These three steps together constitute the update rule of the FastSLAM 1.0 
algorithm for SLAM problems with known data association. We note that 
the execution time of the update does not depend on the total path length t. 
In fact, only the most recent pose all is used in the process of generating 
a new particle at time t. Consequently, past poses can safely be discarded. 
This has the pleasing consequence that neither the time requirements nor 
the memory requirements of FastSLAM depend on the total number of time 
steps spent on data acquisition. 

A summary of the FastSLAM 1.0 algorithm with known data association is 
provided in Table 13.1. For simplicity, this implementation assumes that only 
asingle feature is measured at each point in time. This algorithm implements 
the various update steps in a straightforward manner. Its implementation is 
relatively straightforward; in fact, FastSLAM 1.0 happens to be one of the 
easiest SLAM algorithms to implement! 
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Algorithm FastSLAM 1.0 known correspondence(:;, Ct, ut, Y:—1): 


fork = 1 to M do 
retrieve um Os aa pereo Cubs eh.) from Y; 


/ / loop over all particles 


zl ~ p(ae | sl, ue) 


// sample pose 


ja / / observed feature 
if a j never seen before 
w= ao ee zi) 
H = h'(x} NU 
xl" = H^ Q; (H77 


wel 


// initialize mean 
// calculate Jacobian 
// initialize covariance 


= po // default importance weight 
else 
= htl s a1) // measurement prediction 
H =k (a! ED // calculate Jacobian 
Q=H NR HT + Q / / measurement covariance 
K= XP H” Q // calculate Kalman gain 
pll = TL + K(a — 2) // update mean 
x =(I- KH yah // update covariance 
w! = |2rQ|- 2 exp [(-à( Zt — Ên)? 
QU (a — ĉn) } // importance factor 
endif 
for all other features j' 4 j do // unobserved features 
T i pl vd / / leave unchanged 
xd osi 
endfor 
endfor 
Y, = // initialize new particle set 
do M times // resample M particles 
draw random k with probability c wl // resample 


add (up , (li t S ss m x» to Yı 


endfor 


return Y, 





Table 13.1  FastSLAM 1.0 with known correspondence. 
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13.4.1 
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Improving the Proposal Distribution 


FastSLAM 2.0 is largely equivalent to FastSLAM 1.0, with one important ex- 
ception: Its proposal distribution takes the measurement z into account, 
when sampling the pose z+. By doing so it can overcome a key limitation 
of FastSLAM 1.0. 

On the surface, the difference looks rather marginal: The reader may re- 
call that FastSLAM 1.0 samples poses based on the control u; only, and then 
uses the measurement z; to calculate importance weights. This is problem- 
atic when the accuracy of control is low relative to the accuracy of the robot's 
sensors. Such a situation is illustrated in Figure 13.6. Here the proposal 
generates a large spectrum of samples shown in Figure 13.6a, but only a 
small subset of these samples have high likelihood, as indicated by the el- 
lipsoid. After resampling, only particles within the ellipsoid "survive" with 
reasonably high likelihood. FastSLAM 2.0 avoids this problem by sampling 
poses based on the measurement z; in addition to the control u;. Thus, as 
a result, FastSLAM 2.0 is more efficient than FastSLAM 1.0. On the down- 
side, FastSLAM 2.0 is more difficult to implement than FastSLAM 1.0, and 
its mathematical derivation is more involved. 


Extending the Path Posterior by Sampling a New Pose 


In FastSLAM 2.0, the pose z is drawn from the posterior 


k k 
] ~ plas | aD. a uias Zit, cna) 


z! 
This distribution differs from the proposal distribution provided in (13.12) 
in that (13.26) takes the measurement z; into consideration, along with the 
correspondence c;. Specifically, the expression in (13.26) conditions on 21.4, 
whereas the pose sampler in FastSLAM 1.0 conditions on z;.4. 1. 

Unfortunately, it also comes with more complex math. In particular, the 
mechanism for sampling from (13.26) requires further analysis. First, we 
rewrite (13.26) in terms of the "known" distributions, such as the measure- 
ment and motion models, and the Gaussian feature estimates in the k-th par- 
ticle. 


k 
p(zi | aM. U1:t; Z1:t; C1:t) 


( | [k] [k] 
Bayes P\4t 24,234 1 Ut; Ž1:t—1; Crit) p(x | Dros Ult eta 1 C1:t) 





k 
pa | ahl, U1:t) Z1:t—1; C14) 


k k 
= ni! plz | Tt, chl, U1:t; Z1:t-15 C14) plz: | chl, Uti Z1:t—1; cit) 
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Mark k k 
akov o QU (zy | ae, oy, ues zia cre); pler | 30, ug) 
k 
一 nll [ve | Meo, Ti ZU]. 1) ula, 1-1, Cut) 
( [x d [i 
p(me, | Zt, 234. 4, Ut, 21451, Crt) dme, p(zi | v; 74, Ut) 


Mark i 
MM jl i pz | Te, , Xt, Ct) p(m., | al i, Ziti; Crei) dme, 
gE AE a M DOSE M EL SPIUEMACROR 


~ N Gash(me 24), Qt) [k] x 


f N (mere t1? e,,t—1) 


[k] 
p(zi | T, 1, Ut) 
—— 
~ N Gresg (2l ut), Re) 


This expression makes apparent that our sampling distribution is truly the 
convolution of two Gaussians multiplied by a third. In the general SLAM 
case, the sampling distribution possesses no closed form from which we 
could easily sample. The culprit is the function A: If it were linear, this prob- 
ability would be Gaussian, a fact that shall become more obvious below. In 
fact, not even the integral in (13.27) possesses a closed form solution. For this 
reason, sampling from the probability (13.27) is difficult. 

This observation motivates the replacement of h by a linear approximation. 
As common in this book, this approximation is obtained through a first order 
Taylor expansion, given by the following linear function: 

(megs te) = SP + Hanley — iui) + He (are âf) 
Here we use the following abbreviations: 


alk k [k 
al | = Ce âl ]) 
人 [大 

a = glaf u) 

The matrices Hm and H; are the Jacobians of h. They are the derivatives of h 
with respect to me, and z;, respectively, evaluated at the expected values of 
their arguments: 


Ay, 一 Vine, h(me,, +) | 


— alk). —,, lk] 
TETEE MeSH i a1 


He = Vah(Me, mol, aus auth, 
Under this approximation, the desired sampling distribution (13.27) is a 
Gaussian with the following parameters: 

W ari” 


uH = xBHIQU(a- a) a 
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where the matrix Q" is defined as follows: 


1335 Q = Qi + Hm, HT 


cz,t—1°"“m 
To see, we note that under out linear approximation the convolution theorem 
provides us with a closed form for the integral term in (13.27): 
(13.36) N(24; 2] + Heri- Heal"! QM) 


The sampling distribution (13.27) is now given by the product of this nor- 
mal distribution and the rightmost term in (13.27), the normal N (zi; ahl, Ri). 
Written in Gaussian form, we have 


(13.37) plt (ætl anus 214, Crt) = 1 exp (- HP) 
with 
1338 PE = i[(-2P9- He Heal)? QET (s - 2D - Hees + Heal") 


(x — 47 Re (s, — af) 


This expression is obviously quadratic in our target variable x+, hence p(x | 
oll, U1: Z1:; C1:t) is Gaussian. The mean and covariance of this Gaussian 
are equivalent to the minimum of pH and its curvature. Those are identified 
by calculating the first and second derivatives of př! with respect to x: 





(13.39) = -—HIQU-^'(- 2 B, Hus) + R7 (e — 8) 
(Hy Qe Ha + Ry'n — HTQU (a — 4p"! + Hedy") - Ry ay" 
(13.40) ore HIQUg, + Ry! 
The covariance x" of the sampling distribution is now obtained by the in- 
verse of the second derivative 
(13.41) x = Ee Q-a, + R 
The mean jl) of the sample distribution is obtained by setting the first 
derivative (13.39) to zero. This gives us: 
0342 4H = DE [ATOM (s a t neal) + nite] 


Tol (a, — ah +n [uro + nz] al 





x BTQF (a — 21) +a 
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This Gaussian is the approximation of the desired sampling distribution 
(13.26) in FastSLAM 2.0. Obviously, this proposal distribution is quite a 
bit more involved than the much simpler one for FastSLAM 1.0 in Equation 
(13.12). 


Updating the Observed Feature Estimate 


Just like our first version of the FastSLAM algorithm, FastSLAM 2.0 updates 
the posterior over TE feature estimates based on the measurement z; and 
the sampled pose z . The estimates at time t — 1 are once again represented 


BI. [k] 


by the mean jjj; , and the covariance x5 ] ,- The updated estimates are p; 


and E ] . The natüre of the update ae on whether or not a feature j was 
We at time t. For j 4 c, we already established in Equation (13.4) that 
the posterior over the feature remains unchanged. This implies that instead 
of updating the estimated, we merely have to copy it. 

For the observed feature j = c;, the situation is more intricate. Equation 
(13.5) already stated the posterior over observed features. Here we repeat it 
with the particle index k: 


k 
p(m,, | al | cues Zt) 
k k 
= 1n plz | me, al Le) p(m., [209 tis Aiea) 


NN GhGne aD) — S Nimen oah, a) 


ct tt 一 1 


As in (13.27), the nonlinearity of h causes this posterior to be non-Gaussian, 
which is at odds with FastSLAM 2.0’s Gaussian representation for feature es- 
timates. Luckily, the exact same linearization as above provides the solution: 


h(me,,%t) X lf) + Hm (Me, 一 ny) 


Notice that x; is not a free variable here, hence we can omit the third term in 
(13.28). This approximation renders the probability (13.43) Gaussian in the 
target variable m,,: 


k 
p(m., | al eee 


= nexp{—3lze— 4 — Hama - 3) Qe" 
k 
(a — P — Ha (me — ul, 4) 


k k]|-1 k 
-i (ma 一 TN 1) aS 1 (me, 一 un 
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The new mean and covariance are obtained using the standard EKF mea- 
surement update equations: 


k k k]—1 
RUU ee LN 
uM, = "M 十 KM (= ltl) 
EE ea s 


We notice this is quite a bit more complicated than the update in FastSLAM 
1.0, but the additional effort in implementing this often pays out, in terms of 
improved accuracy. 


Calculating Importance Factors 


The particles generated thus far do not yet match the desired posterior. In 
FastSLAM 2.0, the culprit is the normalizer ^l in (13.27), which is usually 
different for each particle k. These differences are not yet accounted for in 
the resampling process. As in FastSLAM 1.0, the importance factor is given 
by the following quotient. 


[k] target distribution 
we = 





proposal distribution 


Once again, the target distribution that we would like our particles to assume 
is given by the path posterior, p(a | 21:5 23:5, Ci). Under the asymptoti- 
cally correct assumptions that paths in al have been generated according 
to the target distribution one time step earlier, pal | Z14—1, Uist—1, C14—1), 
we note that the proposal distribution is now given by the product 


pih s | Z14—1, 1:1, C1t-1) pial | $1 ET T, 
The second term in this product is the pose sampling distribution (13.27). 
The importance weight is obtained as follows: 


[k] 
[k] _ poe” | Urt, 2a ens) 
^ m D [M] 
pA, | Wa 1 Ult, Z1:t Crt) D(Zi: 3 | U1s—1; Ž1:t—1; C1:t—-1) 








k k k 
pla! l | all i, Uit zn Crit) ptl | U1:t, 21:t) Crit) 
k 
t 


k k 
| ahl, U1:t; Zl:t; Cit) plal ， | U1:t-1) Z1:t—1; cl:t_1) 


k 
pall, | U1:t; Zl:t; C14) 

















DVUU14—1 U1:4—1; Ž1:t—1; Clt-1) 
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k 
Bayes p(z | ud i, Urt Zit-1, C1) plt | ux: 214-1) C14) 
k 
pact, | U1s—1; Ž1:t—1; CLt-1) 
[k] [k] 
Markov plz | Ti:t—1; U1:t; Z1:t—1; C1:t) plti | U1:t-1; Z14—1, C1:t—-1) 
uu [k] 


D(Zi:t_1 | U1:t—1; Ž1:t—1; CLt-1) 
= [k] 
zm n p(z: | Big qs Ut Zl:t—1, C1:t) 


The reader on notice that this expression is the inverse of our normalization 
constant ni*l in (13.27). Further transformations give us the following form: 


人 n f ples Fn sunt, renser) 


k 
plz | chl i ut 214-15. cl) da, 


Markov [k] 


k 
7] [ve | we, all a uit, 21-15 Cit) p(zi | Ti 


= n ffo (a | Mo, £e, al 1» U1:t; Z1:t— 1, C1) 


p(m., | ae, oll 1» U1:t; Z1:t—15 C1: t) dime, plz | zl TS dat 


Markov [k] 
= T] / plt | al u) i plz | The, , Lt, Ct) 
N— ese 


~ N (an g(a! [k ly sut), Re) ~ N (zt;h(me,,£t),Qt) 


1? Ut) da, 


k] 
p(m., dri e oio ziat) dme, da, 
一 一 
[k] (ly 


~N (mess Me, pite, ,t-1 


We find that this expression can once again be approximated by a Gaussian 
over measurements z; by linearizing h. As itis easily shown, the mean of the 
resulting Gaussian is 2;, and its covariance is 


(353 LË! = HTQ,H,- HY, HE +R 


Put differently, the (non-normalized) importance factor of the k-th particle is 
given by the following expression: 


1 
(13.54 wf! = [mL 2 exp {4a — &) LE" (a - à) 


As in FastSLAM 1.0, particles generated in Steps 1 and 2, along with their 
importance factor calculated in Step 3, are collected in a temporary particle 
set. 

The final step of the FastSLAM 2.0 update is a resampling step. Just as 
in FastSLAM 1.0, FastSLAM 2.0 draws (with replacement) M particles from 
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Figure 13.7 The data association problem in SLAM. This figure illustrates that the 
best data association may vary even within regions of high likelihood for the pose of 
the robot. 


Pose uncertainty 


the temporary particle set. Each particle is drawn with a probability pro- 
portional to its importance factor w. The resulting particle set represent 


(asymptotically) the desired factored posterior at time t. 


Unknown Data Association 


This section extends both variants of the FastSLAM algorithm to cases where 
the correspondence variables c;., are unknown. A key advantage of using 
particle filters for SLAM is that each particle can rely on its own, local data 
association decisions. 

We remind the reader that the data association problem at time t is the 
problem of determining the variable c; based on the available data. This 
problem is illustrated in Figure 13.7: Here a robot observes two features in 
the world. Depending on its actual pose relative to these features, these mea- 
surements correspond to different features in the map (depicted as stars in 
Figure 13.7). 

So far, we discussed a number of data association technique using argu- 
ments such as maximum likelihood. Those techniques had in common that 
there is only a single data association per measurement, for the entire filter. 
FastSLAM, by virtue of using multiple particles, can determine the corre- 
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spondence on a per-particle basis. Thus, the filter not only samples over 
robot paths, but also over possible data association decisions along the way. 

This is one of the key features of FastSLAM, which sets it aside from the 
rich body of Gaussian SLAM algorithms. As long as a small subset of the 
particles is based on the correct data association, data association errors are 
not as fatal as in EKF approaches. Particles subject to such errors tend to pos- 
sess inconsistent maps, which increases the probability that they are simply 
sampled away in future resampling steps. 

The mathematical definition of the per-particle data association is straight- 
forward, in that it generalizes the per-filter data association to individual 
particles. Each particle maintains a local set of data association variables, de- 
noted alk In maximum likelihood data association, each el is determined 
by maximizing the likelihood of the measurement zz: 


d" = argmax (z | alk] alle dt) 
Dre 8 DAZt | Ct, C15 Ug) Zbt-1; Udit 
Ct 


An alternative is the data association sampler (DAS), which samples the data 
association variable according to their likelihood 

A A [k] 

& ~ 1 plz | Ct, Ĉr:t-1, V1.4) 71:t—1» ust) 


Both techniques, ML and DAS, make it possible to estimate the number of 
features in the map. SLAM techniques using ML create new features in the 
map if the likelihood falls below a threshold po for all known features in 
the map. DAS associates an observed measurement with a new, previously 
unobserved feature stochastically. They do so with probability proportional 
to npo, where 77 is a normalizer defined in (13.56). 

AL] 


ak k 
C, d n p(z | cn êl PET 
For both techniques, the likelihood is calculated as follows: 
alk k 
plz | Ct, a, ll, Z1:t—15 U1:t) 
alk k 
= [ve | Mo, Ces Cl, Iz ua ure) 
alk k 
p(m., | Ct, eM, atl, Z1:t—15 uj) dme, 
k ak k 
= if plz ha :6521). ple | 855 asl aia). dma 
——————————— 
~ N Ga ih(me, aU), Q,) S (ab cung 
Linearization of enables us to obtain this in closed form: 


(z | alk] [k] ) 
Pl4t | Cty Cati Tt 5 Z1:t—1; Uit 
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k],-i k k k]- k k 
2n QE 72 exp [- 36 — heuil n TE (s — null, oaf) 


The variable aye was defined in Equation (13.35), as a function of the data 
association variable cj. New features are added to the map in exactly the 
same way as outlined above. In the ML approach, a new feature is added 





when the probability p(z; | cz, eM, al Z1, U1) falls beyond a threshold 
po. The DAS includes the hypothesis that an observation corresponds to a 
previously unobserved feature in its set of hypotheses, and samples it with 
probability npo. 


Map Management 


Map management in FastSLAM is largely equivalent to EKF SLAM, with a 
few particulars arising from the fact that data association is handled on a 
per-particle level in FastSLAM. 

As in the alternative SLAM algorithms, any newly added feature requires 
the initialization of a new Kalman filter. In many SLAM problems the mea- 
surement function h is invertible. This is the case, for example, for robots 
measuring range and bearing to features in the plane, in which a single mea- 
surement suffices to produce a (non-degenerate) estimate on the feature lo- 
cation. The initialization of the EKF is then straightforward: 


all ~ p(z, | afl, ut) 
k c k 
ul = hien a) 
x" = (HETOHEN with HE = h(a, s") 
yi = 
t = Po 


Notice that for newly observed features, the pose ol" is sampled according to 


the motion model p(z; | oll uz). This distribution is equivalent to the Fast- 
SLAM sampling distribution (13.26) in situations where no previous location 
estimate for the observed feature is available. 

Initialization techniques for situations in which h is not invertible are dis- 
cussed in Deans and Hebert (2002). In general, such situations require the 
accumulation of multiple measurements, to obtain a good estimate for the 
linearization of h. 

To accommodate features introduced erroneously into the map, FastSLAM 
features a mechanism for eliminating features that are not supported by suf- 
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ficient evidence. Just as in EKF SLAM, FastSLAM does so by keeping track 
of the log odds of the actual existence of individual features in the map. 

Specifically, when a feature is observed, its log odds for existence is incre- 
mented by a fixed amount, which is calculated using the standard Bayes filter 
formula. Similarly, when a feature is not observed even though it should 
have, such negative information results in a decrement of the feature exis- 
tence variable by a fixed amount. Features whose variable sinks below a 
threshold value are then simply removed from the list of particles. It is also 
possible to implement a provisional feature list in FastSLAM. Technically this 
is trivial, since each feature possesses its own particle. 


The FastSLAM Algorithms 


Tables 13.2 and 13.3 summarize both FastSLAM variants with unknown data 
association. In both algorithms, particles are of the form 


k k k k k k k k k 
x E NM DE SU cu af cM ada n 


t 


In addition to the pose oll and the feature estimates m l and 3s r each par- 


ticle maintains the number of features NE in its local map, and each feature 
carries a probabilistic estimate of its existence 7 办 Iterating the filter requires 
time linear in the maximum number of features max, NIS in each map, and 
it is also linear in the number of particles M. Further below, we will discuss 
advanced data structure that yield more efficient implementations. 

We note that both versions of FastSLAM, as described here, consider a 
single measurement at a time. As before, this choice is made for notational 
convenience, and many of the techniques discussed in previous SLAM chap- 
ters can be brought to bear. 


Efficient Implementation 


At first glance, it may appear that each update in FastSLAM requires O(M N) 
time, where M is the number of particles M and N the number of features in 
the map. The linear complexity in M is unavoidable, given that we have to 
process M particles with each update. The linear complexity in N is the re- 
sult of the resampling process. Whenever a particle is drawn more than once 
in the resampling process, a "naive" implementation might duplicate the en- 
tire map attached to the particle. Such a duplication process is linear in the 
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1: Algorithm FastSLAM 1.0(z;, ut, Y:—1): 


























2: fork = 1 to M do / / loop over all particles 
3: retrieve oe js QNIN, (ul D x D a PE 

NE T ee from Yi-1 
4: al") ~ p(w | of, ue) // sample new pose 
5: for j — 1 to NE, do // measurement likelihoods 
6: 27S RG i? xl) // measurement prediction 
7: H; =k (P Y a1) / / calculate Jacobian 
8: Q; = Hj Dh H d + Qt // measurement covariance 
9: wj = |\2rQ;|72 exp pee 

Q;'(a — £i) / / likelihood of correspondence 
10: endfor 
11: Wi, NIS. — po / / importance factor, new feature 
12: wl = max wj // max likelihood correspondence 
13: ĉ = argmax uj / / index of ML feature 
14: NE = max{ N}, ê} // new number of features in map 
15: for j = 1 to TD do // update Kalman filters 
16: ifj=ĉ=1+ NE, then / / is new feature? 
17: pl = =h Ae gll) / / initialize mean 
18: Hy = hi (ul, a) xl = (m) Qu; // initialize covar. 
19: i= zu / / initialize counter 
20: else if j = ê < NIK then // is observed feature? 
21: K-N n 1H A Oe // calculate Kalman gain 
22: Hz Bee 1 +K(z— 2e) — // update mean 
23: sit =(1-K Hj ld [s] Lg / / update covariance 
24: je = Da +1 // increment counter 

see next page for continuation 
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25: 
26: 
27: 
28: 


29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 


38: 


39: 
40: 


41: 
42: 


43: 


44: 


45: 








continued from the previous page 








else // all other features 
ues // copy old mean 
S = EN / / copy old covariance 


if mm outside perceptual 
range of q^ then / / should feature have been seen? 


[k] _ -[k] 
tjt — lj t-11 // no, do not change 
else 
iH =a gei // yes, decrement counter 
jt 6a yes, 


ifi. « 0 then 
discard feature j // discard dubious features 
endif 
endif 
endif 


endfor 


add an LEE wll NEP CN to Yox 
(4 t 1,t 1,01 nile NI NI 


endfor 


Y,-0 / / construct new particle set 


do M times / / resample M particles 


draw random index k 
with probability œ wlk! // resample 


add a Cu E xi i) fev 
(ai t 1,t> 1,t9 "1 nile NI NI 


enddo 


return Y, 








Table 13.2 The algorithm FastSLAM 1.0 with unknown data association. This ver- 
sion does not implement any of the efficient tree representations discussed in the 


chapter. 
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1: Algorithm FastSLAM 2.0(z:, ut, Y:—1): 























2: fork =1toM do // loop over all particles 
3: retrieve (ae 13 NE, (u m T x n futs 

人 
A: for j = 1 to NEL do // calculate sampling distribution 
5: Êj = (aw) // predict pose 
6: z= h(ultl 2355) / / predict measurement 
7: Az j = Vz h(a", £5,t) // Jacobian wrt pose 
8: Hing = Vm; bu s. $j4) / / Jacobian wrt map feature 
9: Qj — Qi t An,; x r HT // measurement information 
10: Mg [HZ Qo Hz "E 1 7 Cov of proposal distribution 
11: Hasj = Ex Haj Q7' 

(zt — Zj) + $j4 // mean of proposal distribution 
12: x l e IN (Hardi in) / / sample pose 
13: £j — = h(ul $3 n) / / measurement prediction 
14: Tj = [ons exp (-i 

(z= 23)" Q77 (z 一 2;)} // correspondence likelihood 
15: endfor 
16: Tru UN = po / / likelihood of new feature 
17: 6 = argmax Tj // ML correspondence 
18: NE x: max{ N}, ê} // new number of features 
19: for j = 1 to NE do // update Kalman filters 
20: ifj=ĉ=1+ NE, then / / is new feature? 
21: e ~ p(z: | all u) // sample pose 
22: ul | =h! (z, z 的 ) // initialize mean 
23: Hmi = Vm; h(ul® ] e) / / Jacobian wrt map feature 
24: 3g zu Hi. Q: H // initialize covariance 
25: if =1 / / initialize counter 
26: whl 一 Do // importance weight 
27: else if j = ê < NEL then // is observed feature? 
28: z! = = al 
29: K= x UHR s Oo / / calculate Kalman gain 
see next page for continuation 
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Hjt = I4 + K(z—2;)  //updatemean 


[k] 


i =ġ; 1 t1 // increment counter 
L= Hej Ri HL; Hm X Hajt Q 

k] -4 od 
wr = |2r L| 2 exp { 5 

(ze — £5)" L^! (a — 2;)} // importance weight 
else // all other features 

mh = Hes / / copy old mean 
x = 2 NA / / copy old covariance 


if um outside perceptual 
range of x] then // should feature have been seen? 


il = ihl // no, do not change 

else 
= = abe = // yes, decrement counter 
ifi < 0 then 

discard feature j // discard dubious features 

endif 

endif 

endif 


endfor 
add O enon x i.) tex. 
(4 t 1,0 1,001 NIS NI nil 
endfor 
Y, = // construct new particle set 


do M times // resample M particles 


draw random index k 
with probability œ wlk! // resample 


[k] aik] / [k] yik] [k] [k] [k] ik] 
ada (4 , Nl (GEL BE d m LE 
enddo 


return Y, 





Table 13.3 The FastSLAM 2.0 Algorithm, stated here unknown data association. 
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size of the map N. Furthermore, a naive implementation of data association 
may result in evaluating the measurement likelihood for each of the N fea- 
tures in the map, resulting again in linear complexity in N. We note that a 
poor implementation of the sampling process might easily add another log N 
to the update complexity. 

Efficient implementations of FastSLAM require only O(M log N) update 
time. This is logarithmic in the size of the map N. First, consider the situ- 
ation with known data association. Linear copying costs can be avoided by 
introducing a data structure for representing particles that allow for more 
selective updates. The basic idea is to organize the map as a balanced binary 
tree. Figure 13.8a shows such a tree for a single particle, in the case of M — 8 
features. Notice that all Gaussian parameters Tid and xn for all j are located 
at the leaves of the tree. Assuming that the tree is approximately balanced, 
accessing a leaf requires time logarithmic in N. 

Suppose FastSLAM incorporates a new control u; and a new measurement 
z,. Each new particle in Y; will differ from the corresponding one in Y;. ; in 
two ways: First, it will possess a different pose estimate obtained via (13.26), 
and second, the observed feature's Gaussian will have been updated, as spec- 
ified in Equations (13.47) and (13.48). All other Gaussian feature estimates, 
however, will be equivalent to the generating particle. When copying the 
particle, thus, only a single path has to be modified in the tree representing 
all Gaussians, leading to the logarithmic update time. 

An illustration of this "trick" is shown in Figure 13.8b: Here we assume 
ci — 3, hence only the Gaussian parameters ui and x are updated. Instead 
of generating an entire new tree, only a single path is created, leading to the 
Gaussian ci = 3. This path is an incomplete tree. The tree is completed by 
copying the missing pointers from the tree of the generating particle. Thus, 
branches that leave the path will point to the same (unmodified) subtree as 
that of the generating tree. Clearly, generating this tree takes only time loga- 
rithmic in N. Moreover, accessing a Gaussian also takes time logarithmic in 
N,since the number of steps required to navigate to a leaf of the tree is equiv- 
alent to the length of the path (which is by definition logarithmic). Thus, both 
generating and accessing a partial tree can be done in time O(log N). Since 
in each updating step M new particles are created, an entire update requires 
time in O(M log N). 

Organizing particles in trees raises the question as to when to deallocate 
memory. Memory deallocation can equally be implemented in amortized 
logarithmic time. The idea is to assign a variable to each node—internal 
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Figure 13.8 (a) A tree representing N = 8 feature estimates within a single particle. 
(b) Generating a new particle from an old one, while modifying only a single Gaus- 
sian. The new particle receives only a partial tree, consisting of a path to the modified 
Gaussian. All other pointers are copied from the generating tree. This can be done in 
time logarithmic in N. 
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Memory Usage of Log(N) FastSLAM vs. Linear FastSLAM - 100 Particles 
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Figure 13.9 Memory requirements for linear and log( N) version of FastSLAM 1.0. 


or leaf—that counts the number of pointers pointing to it. The counter of 
a newly created node will be initialized by 1. It will be incremented as 
pointers to a node are created in other particles. Decrements occur when 
pointers are removed (e.g., pointers of pose particles that fail to survive the 
resampling process). When a counter reaches zero, its children’s counters are 
decremented and the memory of the corresponding node is deallocated. The 
process is then applied recursively to all children of the node whose counter 
may have reached zero. This recursive process will require O(M log N) time 
on average. 

The tree also induces substantial memory savings. Figure 13.9 shows the 
effect of the efficient tree technique on the memory consumed by FastSLAM, 
measured empirically. This graph is the result of an actual implementation of 
FastSLAM 1.0 with M — 100 particles for acquiring feature-based maps. The 
graph shows nearly two orders of magnitude savings for a map with 50,000 
features. The relative savings in update time are similar in value. 

Obtaining logarithmic time complexity for FastSLAM with unknown data 
association is more difficult. Specifically, we need a technique that restricts 
data association search to the local neighborhood of a feature, to avoid calcu- 
lating the data association probability for all N features in the map. Further, 
the tree has to remain approximately balanced. 

There indeed exists variants of kernel density trees, or kd-trees, that can 
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meet these assumptions, assuming that the variance in the sensor measure- 
ments is small compared to the overall size of the map. For example, the 
bkd-tree proposed by Procopiuc et al. (2003) maintains a sequence of trees of 
growing complexity. By carefully shifting items across those trees, a loga- 
rithmic time recall can be guaranteed under amortized logarithmic time for 
inserting new features in the map. Another is the DP-SLAM algorithm pro- 
posed by Eliazar and Parr (2003), which uses history trees for efficient storage 
and retrieval, similar to the one described here. 


FastSLAM for Feature-Based Maps 


Empirical Insights 


The FastSLAM algorithm has been applied to a number of map represen- 
tation and sensor data. The most basic application concerns feature-based 
maps, assuming that the robot is equipped with a sensor for detecting range 
and bearing to landmarks. One such data set is the Victoria Park dataset, 
which we already discussed in Chapter 12. Figure 13.10a shows the path 
of the vehicle obtained by integrating the estimated controls. Controls are 
a poor predictor of location for this vehicle; after 30 minutes of driving, the 
estimated position of the vehicle is well over 100 meters away from its GPS 
position. 

The remaining three panels of Figure 13.10 show the output of FastSLAM 
1.0. In all these diagrams, the path estimated by GPS is shown as a dashed 
line, and the output of FastSLAM is shown as a solid line. The RMS error of 
the resulting path is just over 4 meters over the 4 km traverse. This experi- 
ment was run with M = 100 particles. This error is indistinguishable from 
the error of other state-of-the-art SLAM algorithms, such as the ones dis- 
cussed in previous chapters. The robustness of FastSLAM becomes apparent 
in Figure 13.10d, which plots the result for an experiment in which we simply 
ignored the motion information. Instead, the odometry-based motion model 
was replaced by a Brownian motion model. The average error of FastSLAM 
is statistically indistinguishable from the error obtained before. 

When implementing FastSLAM in feature-based maps, it is important to 
consider negative information. When negative information is used to esti- 
mate the existence of each feature, as described in Chapter 13.6, many spuri- 
ous features can be removed from the map. Figure 13.11 shows the Victoria 
Park map built with and without considering negative evidence. Here the 
use of negative information results in 44% percent fewer features in the re- 
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(a) Raw vehicle path (b) FastSLAM 1.0 (solid), GPS path (dashed) 























(c) Paths and map with aerial image 


Na 2g 




















Figure 13.10 (a) Vehicle path predicted by the odometry; (b) True path (dashed line) 
and FastSLAM 1.0 path (solid line); (c) Victoria Park results overlayed on aerial im- 
agery with the GPS path in blue (dashed), average FastSLAM 1.0 path in yellow 
(solid), and estimated features as yellow circles. (d) Victoria Park Map created with- 
out odometry information. Data and aerial image courtesy of José Guivant and Ed- 
uardo Nebot, Australian Centre for Field Robotics. 


sulting map. While the correct number of features is not available, visual 
inspection of the maps suggests that many of the spurious features have been 
eliminated. 

It makes sense to compare FastSLAM to EKF SLAM, which continues to be 
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(a) Map without feature elimination (b) Map with feature elimination 























Figure 13.11 FastSLAM 1.0 (a) without and (b) with feature elimination based on 
negative information. 


a popular benchmark algorithm. For example, Figure 13.12 compares the ac- 
curacy of FastSLAM 1.0 with that of the EKF, for various numbers of particles 
from 1 to 5,000. The error of the EKF SLAM is shown as a dashed horizon- 
tal line in Figure 13.12 for comparison. The accuracy of FastSLAM 1.0 ap- 
proaches the accuracy of the EKF as the number of particles is increased, and 
it is statistically indistinguishable from that of the EKF past approximately 10 
particles. This is interesting because FastSLAM 1.0 with 10 particles and 100 
features requires an order of magnitude fewer parameters than EKF SLAM 
in order to achieve this level of accuracy. 

In practice, FastSLAM 2.0 yields superior results to FastSLAM 1.0, though 
the improvement is of significance only under certain circumstances. As a 
rule of thumb, both algorithms produce comparable results when the num- 
ber of particles M is large, and when the measurement noise is large com- 
pared to the motion uncertainty. This is illustrated in Figure 13.13, which 
graphs the accuracy of either FastSLAM variant as a function of the mea- 
surement noise, using M = 100 particles. The most important finding here is 
FastSLAM 1.0’s relatively poor performance in low-noise simulations. One 
way to test whether a FastSLAM 1.0 implementation suffers from this pathol- 
ogy is to artificially inflate the measurement noise in the probabilistic model 
p(z | x). If, as a result of this inflation, the overall map error goes down—not 
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Accuracy of FastSLAM vs. the EKF on Simulated Data 
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Figure 13.12 A comparison of the accuracy of FastSLAM 1.0 and the EKF on simu- 
lated data. 


up—it is time to switch to FastSLAM 2.0. 


Loop Closure 


No algorithm is perfect. There exists problems in which FastSLAM is infe- 
rior to its Gaussian counterparts. One such problem involves loop closure. 
In loop closure, a robot moves through unknown terrain and at some point 
encounters features not seen for a long time. It is here that maintaining the 
correlations in a SLAM algorithm is particularly important, so that the infor- 
mation acquired when closing a loop can be propagated through the entire 
map. EKFs and GraphSLAM maintain such correlations directly, whereas 
FastSLAM maintains them through its diversity in the particle sets. Thus, 
the ability to close loops, depends on the number of particles M. Better di- 
versity in the sample set results in better loop closing performance, because 
new observations can affect the pose of the vehicle further back in the past. 
Unfortunately, by pruning away improbable trajectories of the vehicle, re- 
sampling eventually causes all of the FastSLAM particles to share a common 
history at some point in the past. New observations cannot affect the posi- 
tions of features observed prior to this point. This common history point can 
be pushed back in time by increasing the number of particles M. This pro- 
cess of throwing away correlation data over time enables FastSLAM's effi- 
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Accuracy of FastSLAM Algorithms On Simulated Data 
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Figure 13.13 FastSLAM 1.0 and 2.0 with varying levels of measurement noise: As to 
be expected, FastSLAM 2.0 is uniformly superior to FastSLAM 1.0. The difference is 
particularly obvious for small particle sets, where the improved proposal distribution 
focuses the particles much better. 


cient sensor updates. This efficiency comes at the cost of slower convergence 
speed. Throwing away correlation information means that more observa- 
tions will be required to achieve a given level of accuracy. Clearly, FastSLAM 
2.0's improved proposal distribution ensures that fewer particles are elimi- 
nated in resampling compared to FastSLAM 1.0, but it does not alleviate this 
problem. 

In practice, diversity is important, and it is worthwhile to optimize the im- 
plementation so as to maintain maximum diversity. Examples of loop closure 
are shown in Figure 13.15. These figures show the histories of all M particles. 
In Figure 13.15a, the FastSLAM 1.0 particles share a common history part of 
the way around the loop. New observations can not affect the positions of 
features observed before this threshold. In this case of FastSLAM 2.0, the al- 
gorithm is able to maintain diversity that extends back to the beginning of 
the loop. This is crucial for reliable loop closing and fast convergence. 

Figure 13.16a shows the result of an experiment comparing the loop clos- 
ing performance of FastSLAM 1.0 and 2.0. As the size of the loop increases, 
the error of both algorithms increases. However, FastSLAM 2.0 consistently 
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Figure 13.14 Map of Victoria Park by FastSLAM 2.0 with M = 1 particle. 


outperforms FastSLAM 1.0. Alternately, this result can be rephrased in terms 
of particles. FastSLAM 2.0 requires fewer particles to close a given loop than 
FastSLAM 1.0. 

Figure 13.16b shows the results of an experiment comparing the conver- 
gence speed of FastSLAM 2.0 and the EKF. FastSLAM 2.0 (with 1, 10, and 100 
particles) and the EKF were each run 10 times around a large simulated loop 
of features, similar to the ones shown in Figure 13.16a&b. Different random 
seeds were used for each run, causing different controls and observations to 
be generated for each loop. The RMS position error in the map at every time 
step was averaged over the 10 runs for each algorithm. 

As the vehicle goes around the loop, error should gradually build up in 
the map. When the vehicle closes the loop at iteration 150, revisiting old 
features should affect the positions of features all around the loop, causing 
the overall error in the map to decrease. This clearly happens in the EKF. 
FastSLAM 2.0 with a single particle has no way to affect the positions of past 
features so there is no drop in the feature error. As more particles are added 
to FastSLAM 2.0, the filter is able to apply observations to feature positions 
further back in time, gradually approaching the convergence speed of the 
EKF. Clearly, the number of particles necessary to achieve convergence time 
close to the EKF will increase with the size of the loop. The lack of long-range 
correlations in the FastSLAM representation is arguably the most important 
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Particles share common history here 


























Figure 13.15 FastSLAM 2.0 can close larger loops than FastSLAM 1.0 given a con- 
stant number of particles. 


weakness of FastSLAM algorithm over Gaussian SLAM techniques. 


Grid-based FastSLAM 


The Algorithm 


In Chapter 9 we studied occupancy grid maps as a volumetric representation 
of robot environments. The advantage of such a representation is that it does 
not require any predefined definition of landmarks. Instead, in can model 
arbitrary types of environments. In the remainder of this chapter, we will 
therefore extend the FastSLAM algorithm to such representations. 

To adapt the FastSLAM algorithm to occupancy grid maps, we need three 
functions that we already defined in previous sections. First, we have to sam- 
ple from the motion posterior p(x; | oll, uz) as in Equation 13.12. Hence, we 
need such a sampling technique. Second, we need a technique for estimat- 
ing the map of each particle. It turns out that we can rely on occupancy grid 
mapping, as described in Chapter 9. Finally, we need to compute the impor- 
tance weights of the individual particles. That is, we require an approach to 
compute the likelihood p(z; | z?, m") of the observation z;, conditioned on 
the pose z^, the map ml, and the most recent measurement z;. 

As it turns out, the extension of FastSLAM to occupancy grid maps is quite 
straightforward. Table 13.4 describes FastSLAM with occupancy grid maps. 
Not surprisingly, this algorithm borrows parts of Monte Carlo Localization 
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Accuracy of FastSLAM Algorithms vs. Loop Size Convergence of FastSLAM and the EKF Closing a Large Loop 
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Figure 13.16 (a) Accuracy as a function of loop size: FastSLAM 2.0 can close larger 
loops than FastSLAM 1.0 given a fixed number of particles. (b) Comparison of the 
convergence speed of FastSLAM 2.0 and the EKF. 


(see Table 8.2) and occupancy grid mapping (see Table 9.1). The individual 
functions used in this algorithm are variants of those used in the localization 
and mapping algorithms. 


In particular the function measurement, model map(z, xl", ml) 


computes the likelihood of the measurement z; given the pose al repre- 
sented by the k-th particle and given the map ml, computed based on the 
previous measurement and the trajectory represented by this particle. Fur- 
thermore, the function updated occupancy. grid(z;, all : m!) computes 
a new occupancy grid map, given the current pose a of the k-th particle, 


the map mE, associated to it, and the measurement zz. 


Empirical Insights 


Figure 13.17 shows a typical situation of the application of the grid-based 
FastSLAM algorithm. Shown there are three particles together with their 
associated maps. Each particle represents a potential trajectory of the robot, 
which explains why each occupancy grid map looks different. The center 
map is the best in terms of global consistency. 

A typical map acquired with the FastSLAM algorithm is depicted in Fig- 
ure 13.19. The size of this environment is 28m x 28m. The length of the 
robot's trajectory is 491m and the average speed was 0.19m/s. The reso- 
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Figure 13.17 Application of the grid-based variant of the FastSLAM algorithm. Each 
particle carries its own map and the importance weights of the particles are computed 
based on the likelihood of the measurements given the particle’s own map. 

















Figure 13.18 Occupancy grid map generated from laser range data and based on 
pure odometry. All images courtesy of Dirk Hahnel, University of Freiburg. 
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Figure 13.19 Occupancy grid map corresponding to the particle with the highest 
accumulated importance weight obtained by the algorithm listed in Table 13.4 from 
the data depicted in Figure 13.18. The number of particles to create this experiment 
was 500. Also depicted in this image is the path represented by the particle with the 
maximum accumulated importance weight. 





(a) (b) 























Figure 13.20 Trajectories of all samples shortly before (left) and after (right) closing 
the outer loop of the environment depicted in Figure 13.19. Images courtesy of Dirk 
Háhnel, University of Freiburg. 
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A 


Algorithm FastSLAM occupancy grids(X, .,, ur, z;): 


d, mA E 

fork = 1 to M do 
oll = sample motion model(u;, cll) 
wl"! = measurement model map(z;, gl. ml) 
ml! = updated_occupancy_grid(z, zl : mI) 
A, = Ae (ay, mpl wp) 

endfor 


fork = 1 to M do 


draw i with probability cx wl” 
add (x!!! mlt”) to A, 


endfor 


Eo 
Yr S 


return X; 











Table 13.4 The FastSLAM algorithm for learning occupancy grid maps. 


lution of the map is 10cm. To learn this map, as few as 500 particles were 
used. During the overall process the robot encountered two loops. A map 
calculated from pure odometry data is shown in Figure 13.18, illustrating the 
amount of error in the robot's odometry. 

The importance of using multiple particles becomes evident in Fig- 
ure 13.20, which visualizes the trajectories of the samples shortly before and 
after closing a loop. As the left image illustrates, the robot is quite uncertain 
about its position relative to the starting position, hence the wide spread of 
particles at the time of loop closure. However, a few resampling steps after 
the robot re-enters known terrain suffice to reduce the uncertainty drastically 
(right image). 
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Summary 


This chapter has presented the particle filter approach to the SLAM problem, 
known as the FastSLAM algorithm. 


* The basic idea of FastSLAM is to maintain a set of particles. Each particle 
contains a sampled robot path. It also contains a map, but here each fea- 
ture in the map is represented by its own, local Gaussian. The resulting 
representation requires space linear in the size of the map, and linear in 
the number of particles. 


* The "trick" to represent a map as a set of separate Gaussians—instead of 
one big joint Gaussian as was the case in the EKF SLAM algorithm—is 
possible because of the factorial structure of the SLAM problem. Specif- 
ically, we noticed that the map features are conditionally independent 
given the path. By factoring out the path (one per particle), we can then 
simply treat each map feature as independent, thereby avoiding the costly 
step of maintaining the correlations between them that plagued the EKF 
approach. 


* The update in FastSLAM follows directly that of the conventional particle 
filter: Sample a new pose, then update the observed features. This update 
can be performed online, and FastSLAM is a solution to the online SLAM 
problem. 


* Consequently, we noted that FastSLAM solves both SLAM problems: The 
offline SLAM problem and the online SLAM problem. Our derivation 
treated FastSLAM as an offline technique, in which particles represented 
samples in path space, not just in the momentary pose space. However, 
it so turned out that none of the update steps require knowledge of any 
pose other than the most recent one, so one can safely discard past pose 
estimates. This makes it possible to run FastSLAM as a filter. It also avoids 
that the size of particles grow linearly with time. 


* We encountered two different instantiations of FastSLAM, with version 
numbers 1.0 and 2.0. FastSLAM 2.0 is an advanced version of FastSLAM. 
It differs from the base version in one key idea: FastSLAM 2.0 incorporates 
the measurement when sampling a new pose. The math for doing so was 
somewhat more involved, but FastSLAM 2.0 is found to be superior in 
that it requires fewer particles than FastSLAM 1.0. 
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* The idea of using particle filters makes it possible to estimate data asso- 
ciation variables on a per-particle basis. Each particle can be based on 
a different data association. This provides FastSLAM with a straightfor- 
ward and powerful mechanism for solving data association problems in 
SLAM. Previous algorithms, specifically the EKF, GraphSLAM, and the 
SEIF, are forced to adopt a single data association decision for the entire 
filter at any point in time, and therefore require more care in selecting the 
value of the data association. 


* For efficiently updating the particles over time, we discussed tree repre- 
sentations of the map. These representations make it possible to reduce 
the complexity of FastSLAM updates from linear to logarithmic, enabling 
particles to share parts of the maps that are identical. The ideas of such 
trees is important in practice, as it enables FastSLAM to scale to 10° or 
more features in the map. 


* We also discussed techniques for utilizing negative information. One per- 
tains to the removal of features from the map that are not supported by 
sufficient evidence. Here FastSLAM adopts an evidence integration ap- 
proach familiar from the occupancy grid map chapter. Another pertains 
to the weighting of particles themselves: When failing to observe a feature 
in the map of a particle, such a particle can be devalued by multiplying 
its importance weight accordingly. 


* We discussed a number of practical properties of the two FastSLAM algo- 
rithms. Experiments show that both algorithms perform well in practice, 
both in feature-based maps and in volumetric occupancy grid-style maps. 
From a practical perspective, FastSLAM is among the best probabilistic 
SLAM techniques presently available. Its scaling properties are only ri- 
valed by some of the information filter algorithms described in the two 
previous chapters. 


* We extended FastSLAM 2.0 to different map representations. In one rep- 
resentation, the map was composed of points detected by a laser range 
finder. In this case, we were able to abandon the idea of modeling the 
uncertainty in the features through Gaussians, and rely on scan matching 
techniques to implement the forward sampling process of FastSLAM 2.0. 
The use of a particle filter leads to a robust loop closure technique. 


Possibly the biggest limitation of FastSLAM is the fact that maintains de- 
pendencies in the estimates of feature locations only implicitly, through the 
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diversity of its particle set. In certain environments this can negatively af- 
fect the convergence speed when compared to Gaussian SLAM techniques. 
When using FastSLAM care has to be taken to reduce the damaging effects 
of the particle deprivation problem in FastSLAM. 


Bibliographical Remarks 


The idea of computing distributions over sets of variables by combining samples with a para- 
metric density function is due to Rao (1945) and Blackwell (1947). Today, this idea has become 
a common tool in the statistics literature (Gilks et al. 1996; Doucet et al. 2001). A first mapping 
algorithm that uses particle filters for closing loops can be found in Thrun et al. (2000b). The for- 
mal introduction of Rao-Blackwellized particle filters into the field of SLAM is due to Murphy 
(2000a); Murphy and Russell (2001), who developed this idea in the context of occupancy grid 
maps. 

The FastSLAM algorithm was first developed by Montemerlo et al. (2002a), who also de- 
veloped a tree representation for efficiently maintaining multiple maps. The extension of this 
algorithm to high-resolution maps is due to Eliazar and Parr, whose algorithm DP-SLAM has 
generated maps from laser range scans with unprecedented accuracy and detail. Their central 
data structure is called ancestry trees, which extends FastSLAM's trees to update problems in 
occupancy-style maps. A more efficient version is known as DP-SLAM 2.0 (Eliazar and Parr 
2004). The FastSLAM 2.0 algorithm was developed by Montemerlo et al. (2003b). It is based on 
prior work by van der Merwe et al. (2001), who pioneered the idea of using the measurement as 
part of the proposal distribution in particle filter theory. The grid-based FastSLAM algorithm in 
this chapter is due to Háhnel et al. (2003b), who integrated the improved proposal distribution 
idea with Rao-Blackwellized filters applied to grid-based maps. A Rao-Blackwellized filter for 
tracking the status of doors in a dynamic office environment is described in Avots et al. (2002). 

One of FastSLAM's most important contributions lies in the area of data association, so a 
word is on order on the literature of data association in SLAM. The original SLAM work (Smith 
et al. 1990; Moutarlier and Chatila 1989a) resorted to maximum likelihood data association, as 
derived in detail by Dissanayake et al. (2001). A key limitation of these data association tech- 
niques was the inability to enforce mutual exclusivity: Two different features seen within a single 
sensor measurement (or in short temporal succession) cannot possibly correspond to the same 
physical feature in the world. Realizing this, Neira and Tardós developed techniques for testing 
correspondence for sets of features, which greatly reduced the number of data association er- 
rors. To accommodate the huge number of potential associations (exponential in the number of 
features considered at each point in time), Neira et al. (2003) proposed random sampling tech- 
niques in the data association space. All of these techniques, however, maintained a single mode 
in the SLAM posterior. Feder et al. (1999) applied the greedy data association idea to sonar data, 
but implemented a delayed decision to resolve ambiguities. 

The idea of maintaining multi-modal posterior in SLAM goes back to Durrant-Whyte et al. 
(2001), whose algorithm sum of Gaussians uses a Gaussian mixture for representing the poste- 
rior. Each mixture component corresponded to a different trace through the history of all data 
association decisions. FastSLAM follows this idea, but using particles instead of Gaussian mix- 
ture components. The idea of lazy data association can be traced back to other fields, such as 
the popular RANSAC algorithm (Fischler and Bolles 1981) in computer vision. The tree algo- 
rithm presented in the previous chapter is due to Háhnel et al. (2003a). As mentioned there, it 
parallels work by Kuipers et al. (2004). An entirely different approach for data association is 
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described in Shatkay and Kaelbling (1997); Thrun et al. (1998b), which use the expectation max- 
imization algorithm for resolving correspondence problems (see (Dempster et al. 1977)). The 
EM algorithm iterates a phase of data association for all features with a phase of map building, 
thereby performing search simultaneously in the space of numerical map parameters, and in 
the space of discrete correspondences. Araneda (2003) successfully uses MCMC techniques for 
data association in offline SLAM. 

The data association problem arises naturally in the context of multi-robot map integration. 
A number of papers have developed algorithms for localizing one robot relative to another un- 
der the assumption that they both operate in the same environment, and their maps overlap 
(Gutmann and Konolige 2000; Thrun et al. 2000b). Roy and Dudek (2001) developed a tech- 
nique by which robots had to rendezvous for integrating their information. For the general 
case, however, a data association technique must consider the possibility that maps might not 
even overlap. Stewart et al. (2003) developed a particle filter algorithm that explicitly models the 
possibility that maps do not overlap. Their algorithm contains a Bayesian estimator for calcu- 
lating this probability, which takes into consideration how “common” specific local maps are in 
the environment. The idea of matching sets of features for data association in multi-robot map- 
ping is due to Dedeoglu and Sukhatme (2000); see also Thrun and Liu (2003). A new non-planar 
representation of maps was proposed by Howard (2004), for circumvented inconsistency in in- 
complete maps. His approach led to remarkably accurate multi-robot mapping results (Howard 
et al. 2004). 


Exercises 


1. Name three key, distinct advantages for each of the following SLAM al- 
gorithms: EKF, GraphSLAM, and FastSLAM. 


2. Describe a set of circumstances under which FastSLAM 1.0 will fail to con- 
verge and FastSLAM 2.0 will converge to a correct map (with probability 
1). 


3. On page 443, we stated that conditioning on the most recent pose x, instead 
of the entire path x: is insufficient, as dependencies may arise through previous 
poses. Prove this assertion. You might prove it with an example. 


4. FastSLAM generates many different maps, one for each particle. This 
chapter left open how to combine these maps into a single posterior es- 
timate. Suggest two such methods, one for FastSLAM with known corre- 
spondence, and one for FastSLAM with per-particle data association. 


5. The improvement of FastSLAM 2.0 over FastSLAM 1.0 lies in the nature of 
the proposal distribution. Develop the same for Monte Carlo Localization: 
Devise a similar proposal distribution for MCL in feature-based maps, 
and state the resulting algorithm "MCL 2.0." For this exercise you might 
assume known correspondence. 
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6. Chapter 13.8 describes an efficient tree implementation, but it does not 
provide pseudo-code. In this exercise, you are asked to provide the cor- 
responding data structures and update equations for the tree, assuming 
that the number of features is known in advance, and that we do not face 
a correspondence problem. 


7. In this question, you are asked to verify empirically that FastSLAM in- 
deed maintains the correlations between feature estimates and the robot 
pose estimate. Specifically, you are asked to implement a simple Fast- 
SLAM 1.0 algorithm for linear Gaussian SLAM. Recall from previous ex- 
ercises that the motion and measurement equations in linear Gaussian 
SLAM are linear with additive Gaussian noise: 


te ~ N(ae-1 + ut, R) 
z = N (m; 一 Tr, Q) 


Run FastSLAM 1.0 in simulation. After t steps, fit a Gaussian over the 
joint space of feature locations and the robot pose. Compute the cor- 
relation matrix from this Gaussian, and characterize the strength of the 
correlations as a function of t. What are your observations? 


8. As mentioned in the text, FastSLAM is a Rao-Blackwellized particle fil- 
ter. In this question you are asked to design a Rao-Blackwellized fil- 
ter for a different problem: Localization with a robot that systematically 
drifts. Systematic drift is a common phenomena in odometry; just in- 
spect Figures 9.1 and 10.7 for two particularly strong instances of drift. 
Suppose you are given a map of the environment. Can you design a Rao- 
Blackwellized filter that simultaneously estimates the drift parameters of 
the robot and the global location of the robot in the environment? Your 
filter should combine particle filters with Kalman filters. 
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Markov Decision Processes 


Motivation 


This is the first chapter on probabilistic planning and control in this book. 
Thus far, the book has focused exclusively on robot perception. We have 
discussed a range of probabilistic algorithms that estimate quantities of in- 
terest from sensor data. However, the ultimate goal of any robot software 
is to choose the right actions. This and the following chapters will discuss 
probabilistic algorithms for action selection. 

To motivate the study of probabilistic planning algorithms, consider the 
following examples. 


1. A robotic manipulator grasps and assembles parts arriving in random 
configuration on a conveyor belt. The configuration of a part is un- 
known at the time it arrives, yet the optimal manipulation strategy re- 
quires knowledge of the configuration. How can a robot manipulate such 
pieces? Will it be necessary to sense? If so, are all sensing strategies 
equally good? May there exist manipulation strategies that result in a 
well-defined configuration without sensing? 


2. An underwater vehicle shall travel from Canada to the Caspian Sea. Shall 
it take the shortest route through the North Pole, running risk of loosing 
orientation under the Ice? Or should it take the longer route through open 
waters, where it can regularly relocalize using GPS, the satellite-based 
global positioning system? To what extent do such decisions depend on 
the accuracy of the submarine’s inertial sensors? 


3. A team of robots explores an unknown planet, seeking to acquire a joint 
map. Shall the robots seek each other to determine their relative location 
to each other? Or shall they instead avoid each other so that they can 
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cover more unknown terrain in shorter amounts of time? How does the 
optimal exploration strategy change when the relative starting locations 
of the robots are unknown? 


These examples illustrate that action selection in many robotics tasks is 
closely tied to the notion of uncertainty. In some tasks, such as robot ex- 
ploration, reducing uncertainty is the direct goal of action selection. Such 
problems are known as information gathering tasks. They will be studied in 
Chapter 17. In other cases, reducing uncertainty is merely a means to achiev- 
ing some other goal, such as reliably arriving at a target location. These tasks 
will be studied in this and the next chapters. 

From an algorithm design perspective, it is convenient to distinguish two 
types of uncertainty: uncertainty in action effects, and uncertainty in percep- 
tion. 

First, we distinguish deterministic from stochastic action effects. Many the- 
oretical results in robotics are based on the assumption that the effects of 
control actions are deterministic. In practice, however, actions cause uncer- 
tainty, as outcomes of actions are non-deterministic. The uncertainty arising 
from the stochastic nature of the robot and its environments mandates that 
the robot senses at execution time, and reacts to unanticipated situations— 
even if the environment state is fully observable. It is insufficient to plan a 
single sequence of actions and blindly execute it at run-time. 

Second, we distinguish fully observable from partially observable systems. 
Classical robotics often assumes that sensors can measure the full state of the 
environment. If this was always the case, then we would not have written 
this book! In fact, the contrary appears to be the case. In nearly all interesting 
real-world robotics problems, sensor limitations are a key factor. 

Obviously, robots should consider their current uncertainty when determin- 
ing what to do. When selecting a control action, at a minimum a robot should 
consider the various outcomes (which might include catastrophic failures), 
and weigh them by the probability that such outcomes might actually occur. 
However, robot control must also cope with future, anticipated uncertainty. 
An example of the latter was given above, where we discussed a robot that 
has the choice between a shorter path through a GPS-denied environment, 
with a longer one that reduces the danger of getting lost. Minimizing antici- 
pated uncertainty is essential for many robotic applications. 

Throughout this chapter, we will take a very liberal view and make vir- 
tually no distinction between planning and control. Fundamentally, both 
planning and control address the same problem: to select actions. They dif- 
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fer in the time constraints under which actions have to be selected, and in 
the role of sensing during execution. The algorithms described in this chap- 
ter are all similar in that they require an off-line optimization, or planning, 
phase. The result of this planning phase is a control policy, which prescribes a 
control action for any reasonable situation. In other words, the control policy 
is effectively a controller, in the sense that it can be used to determine robot 
actions with minimum computing time. By no means is the choice of algo- 
rithms meant to suggest that this is the only way to cope with uncertainty in 
robotics. However, it reflects the style of algorithms that are currently in use 
in the field of probabilistic robotics. 

The majority of algorithms discussed in this chapter assume finite state 
and action spaces. Continuous spaces are approximated using grid-style rep- 
resentations. 

The following four chapters are organized as follows. 


* This chapter discusses in depth the role of the two types of uncertainty 
and lays out their implications on algorithm design. As a first solution to 
a restricted class of problems, we introduce value iteration, a popular plan- 
ning algorithm for probabilistic systems. Our discussion in this chapter 
addresses only the first type of uncertainty: the uncertainty in robot mo- 
tion. It rests on the assumption that the state is fully observable. The 
underlying mathematical framework is known as Markov decision processes 
(MDP). 


* Chapter 15 generalizes the value iteration technique to both types of un- 
certainty, in action effects and perception. This algorithm applies value 
iteration to a belief state representation. The framework underlying 
this algorithm is called partially observable Markov decision processes 
(POMDPs). POMDP algorithms anticipate uncertainty, actively gather in- 
formation, and explore optimally in pursuit of an arbitrary performance 
goal. Chapter 15 also discusses a number of efficient approximations 
which can calculate control policies more efficiently. 


* Chapter 16 introduces a number of approximate value iteration algo- 
rithms for POMDPs. These algorithms have in common that they ap- 
proximate the probabilistic planning process, so as to gain computational 
efficiency. One of these algorithms will shortcut probabilistic planning by 
assuming that at some point in the future, the state becomes fully observ- 
able. Another compresses the belief state into a lower dimensional rep- 
resentation, and plans using this representation. A third algorithm uses 
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Figure 14.1 Near-symmetric environment with narrow and wide corridors. The 
robot starts at the center with unknown orientation. Its task is to move to the goal 
location on the left. 


particle filters and a machine learning approach to condense the problem 
space. All three of these algorithms yield considerable computational im- 
provements while still performing well in practical robotic applications. 


* Chapter 17 addresses the specialized problem of robot exploration. Here 
the robot's goal is to accumulate information about its environment. 
While exploration techniques address the problem of sensor uncertainty, 
the problem is significantly easier than the full POMDP problem, and 
hence can be solved much more efficiently. Probabilistic exploration tech- 
niques are popular in robotics, as robots are frequently used for acquiring 
information about unknown spaces. 


Uncertainty in Action Selection 


Figure 14.1 shows a toy-like environment that we will use to illustrate the 
different types of uncertainty a robot may encounter. Shown in this figure 
is a mobile robot in a corridor-like environment. The environment is highly 
symmetric; the only distinguishing feature are its far ends, which are shaped 
differently. The robot starts at the location indicated, and it seeks to reach the 
location labeled goal. We notice there exist multiple paths to the goal, one 
that is short but narrow, and two others that are longer but wider. 

In the classical robot planning paradigm, there is no uncertainty. The robot 
would simply know its initial pose and the location of the goal. Further- 
more, actions when executed in the physical world have predictable effects, 
and such effects can be pre-planned. In such a situation, there is no need 
to sense. It suffices to plan off-line a single sequence of actions, which can 
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Figure 14.2 The value function and control policy for an MDP with (a) deterministic 
and (b) nondeterministic action effects. Under the deterministic model, the robot is 
perfectly fine to navigate through the narrow path; it prefers the longer path when 
action outcomes are uncertain, to reduce the risk of colliding with a wall. Panel (b) 
also shows a path. 


then be executed at run-time. Figure 14.1 shows an example of such a plan. 
Obviously, in the absence of errors in the robot's motion, the narrow shorter 
path is superior to any of the longer, wider ones. Hence, a "classical" planner 
would choose the former path over the latter. 

In practice, such plans tend to fail, for more than one reason. A robot 
blindly following the narrow hallway runs danger of colliding with the 
walls. Furthermore, a blindly executing robot might miss the goal location 
because of the error it accrued during plan execution. In practice, thus, plan- 
ning algorithms of this type are often combined with a sensor-based, reactive 
control module that consults sensor readings to adjust the plan so as to avoid 
collisions. Such a module might prevent the robot from colliding in the nar- 
row corridor. However, in order to do so it may have to slow down the robot, 
making the narrow path inferior to the wider, longer path. 

A paradigm that encompasses uncertainty in robot motion is known as 
Markov decision processes, or MDPs. MDPs assume that the state of the en- 
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Figure 14.3 Knowledge gathering actions in POMDPs: To reach its goal with more 
than 50% chance, the belief space planner first navigates to a location where the global 
orientation can be determined. Panel (a) shows the corresponding policy, and a pos- 
sible path the robot may take. Based on its location, the robot will then find itself in 
panel (b) or (c), from where it can safely navigate to the goal. 


vironment can be fully sensed at all times. In other words, the perceptual 
model p(z | x) is deterministic and bijective. However, the MDP framework 
allows for stochastic action effects. The action model p(z' | u, x) may be 
non-deterministic. As a consequence, it is insufficient to plan a single se- 
quence of actions. Instead, the planner has to generate actions for a whole 
range of situations that the robot might find itself in, because of its actions or 
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other unpredictable environment dynamics. Our way to cope with the result- 
ing uncertainty shall be to generate a policy for action selection defined for all 
states that the robot might encounter. Such mappings from states to actions 
are known under many names, such as control policy, universal plans, and 
navigation functions. An example of a policy is shown in Figure 14.2. Instead 
of a single sequence of actions, the robot calculates a mapping from states 
to actions indicated by the arrows. Once such a mapping is computed, the 
robot can accommodate non-determinism by sensing the state of the world, 
and acting accordingly. The panel in Figure 14.2a shows a policy for a robot 
with nearly no motion uncertainty, in which cases the narrow path is indeed 
acceptable. Figure 14.2b depicts the same situation for an increased random- 
ness in robot motion. Here the narrow path makes a collision more likely, 
and the detour becomes preferable. This example illustrates two things: the 
importance of incorporating uncertainty in the motion planning process; and 
the ability of finding a good control policy using the algorithm described in 
this chapter. 

Let us now return to the most general, fully probabilistic case, by drop- 
ping the assumption that the state is fully observable. This case is known 
as partially observable Markov decision processes, or POMDPs. In most if not 
all robotics applications, measurements z are noisy projections of the state zx. 
Hence, the state can only be estimated up to a certain degree. To illustrate 
this, consider once again our example, but under different assumptions. As- 
sume that the robot knows its initial location but does not know whether it 
is oriented towards the left or to the right. Further, it has no sensor to sense 
whether it has arrived at the goal location. 

Clearly, the symmetry of the environment makes it difficult to disam- 
biguate the orientation. By moving directly towards the projected goal state 
the robot faces a 50% chance of missing the goal, and instead moving to the 
symmetric location on the right size of the environment. The optimal plan, 
thus, is to move to any of the corners of the environment, which is perceptu- 
ally sufficiently rich to disambiguate its orientation. A policy for moving to 
these locations is shown in Figure 14.3a. Based on its initial orientation, the 
robot may execute any of the two paths shown there. As it reaches a corner, 
its sensors now reveal its orientation, and hence its actual location relative 
to the environment. The robot might now find itself in any of the two situ- 
ations depicted in Figures 14.3b&c, from where it can safely navigate to the 
goal location as shown. 

This example illustrates one of the key aspects of probabilistic robotics. 
The robot has to actively gather information, and in order to do so, might be 


»js.on 000000 


494 


14 Markov Decision Processes 


suffering a detour relative to a robot that knows its state with absolute cer- 
tainty. This problem is paramount in robotics. For nearly any robotics task, 
the robot's sensors are characterized by intrinsic limitations as to what the 
robot can know, and where information can be acquired. Similar situations 
occur in tasks as diverse as locate-and-retrieve, planetary exploration, urban 
search-and-rescue, and so on. 

The question now arises as to how can one devise an algorithm for action 
selection that can cope with this type of uncertainty. As we will learn, this is 
not a trivial question. One might be tempted to solve the problem of what to 
do by analyzing each possible situation that might be the case under the cur- 
rent state of knowledge. In our example, there are two such cases: the case 
where the goal is on the upper left relative to the initial robot heading, and 
the case where the goal is on the lower right. In both these cases, however, 
the optimal policy does not bring the agent to a location where it would be 
able to disambiguate its pose. That is, the planning problem in a partially 
observable environment cannot be solved by considering all possible envi- 
ronments and averaging the solution. 

Instead, the key idea is to generate plans in the belief space, a synonym for 
information space. The belief space comprises the space of all posterior beliefs 
that the robot might hold about the world. The belief space for our simple 
example corresponds to the three panels in Figure 14.3. The top panel shows 
such a belief space policy. It displays the initial policy while the robot is 
unaware of its orientation. Under this policy, the robot navigates to one of 
the corners of the environment where it can localize. Once localized, it can 
safely navigate to the target location, as illustrated in the two lower panels 
of Figure 14.3. Since the a priori chance of each orientation is the same, the 
robot will experience a random transition with a 50% chance of ending up in 
either of the two bottom diagrams. 

In our toy example, the number of different belief states happens to be 
finite: either the robot knows or it does not have a clue. In practical appli- 
cations, this is usually not the case. In worlds with finitely many states the 
belief space is usually continuous, but of finite dimensionality. In fact, the 
number of dimensions of the belief space is of the same order as the number 
of states in the underlying state space. If the state space is continuous, the 
belief space possesses infinitely many dimensions. 

This example illustrates a fundamental property that arises from the 
robot’s inability to perfectly sense the state of the world—one whose im- 
portance for robotics has often been under-appreciated. In uncertain worlds 
a robot planning algorithm must consider the state of its knowledge when 
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making control decisions. In general it does not suffice to consider the most 
likely state only. By conditioning the action on the belief state—as opposed 
to the most likely actual state—the robot can actively pursue information 
gathering. In fact, the optimal plan in belief state gathers information “opti- 
mally,” in that it only seeks new information to the extent that it is actually 
beneficial to the expected utility of the robot’s action. The ability to devise 
optimal control policies is a key advantage of the probabilistic approach to 
robotics over the classical deterministic, omniscient approach. However, as 
we shall see soon, it comes at the price of an increased complexity of the 
planning problem. 


Value Iteration 


Our first algorithm for finding control policies is called value iteration. Value 
iteration recursively calculates the utility of each action relative to a payoff 
function. Our discussion in this chapter will be restricted to the first type of 
uncertainty: stochasticity of the robot and the physical world. We defer our 
treatment of uncertainty arising from sensor limitations to the subsequent 
chapter. Thus, we will assume that the state of the world is fully observable 
at any point in time. 


Goals and Payoff 


Before describing a concrete algorithm, let us first define the problem in 
more concise terms. In general, robotic action selection is driven by goals. 
Goals might correspond to specific configurations (e.g., a part has success- 
fully been picked up and placed by a robot manipulator), or they might ex- 
press conditions over longer periods of time (e.g., a robot balances a pole). 
In robotics, one is sometimes concerned with reaching a specific goal config- 
uration, while simultaneously optimizing other variables, often thought of 
as cost. For example, one might be interested in moving the end-effector of 
a manipulator to a specific location, while simultaneously minimizing time, 
energy consumption, or the number of collisions with obstacles. 

At first glance, one might be tempted to express these desires by two quan- 
tities, one that is being maximized (e.g., the binary flag that indicates whether 
or not a robot reached its goal location), and the other one being minimized 
(e.g., the total energy consumed by the robot). However, both can be ex- 
pressed using a single function called the payoff function. 


5rjs.cn 000000 


496 


(14.1) 


POLICY 


(14.2) 


(14.3) 


14 Markov Decision Processes 


The payoff, denoted r, is a function of the state and the robot control. For 
example, a simple payoff function may be the following: 


pub +100 if u leads to a goal configuration or state 
i i —1 otherwise 


This payoff function "rewards" the robot with 4-100 if a goal configuration is 
attained, while it “penalizes” the robot by —1 for each time step where it has 
not reached that configuration. Such a payoff function will yield maximum 
cumulative return if the robot reaches its goal configuration in the minimum 
possible time. 

Why use a single payoff variable to express both goal achieval and costs? 
This is primarily because of two reasons: First, the notation avoids clutter 
in the formulae yet to come, as it shall unify our treatment of costs and goal 
achieval throughout this book. Second, and more importantly, it pays tribute 
to the fundamental trade-off between goal achievement and costs along the 
way. Since robots are inherently uncertain, they cannot know with certainty 
as to whether a goal configuration has been achieved; instead, all one can 
hope for is to maximize the chances of reaching a goal. This trade-off be- 
tween goal achieval and cost is characterized by questions like Is increasing 
the probability of reaching a goal worth the extra effort (e.g., in terms of energy, 
time)? Treating both goal achieval and costs as a single numerical factor 
enables us to trade off one against the other, hence providing a consistent 
framework for selecting actions under uncertainty. 

We are interested in devising programs that generate actions so as to opti- 
mize future payoff in expectation. Such a program is usually referred to as a 
control policy, or simply policy. It will be denoted as follows: 


T : Ziu—1,;U1:—1 — Ut 
In the case of full observability, we assume the much simpler case: 
T 2 Xt — Ut 


Thus, a policy 7 is a function that maps past data into controls, or states into 
controls when the state is observable. 

So far, our definition of a control policy makes no statements about its 
computational properties. It might be a fast, reactive algorithm that bases 
its decision on the most recent data item only, or an elaborate planning al- 
gorithm. In practice, however, computational considerations are essential, in 
that any delay in calculating a control may negatively affect a robot's perfor- 
mance. Our definition of a policy 7 also makes no commitment as to whether 
it is deterministic or non-deterministic. 
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An interesting concept in the context of creating control policies is the plan- 
ning horizon. Sometimes, it suffices to choose a control action so as to maxi- 
mize the immediate next payoff value. Most of the time, however, an action 
might not pay off immediately. For example, a robot moving to a goal lo- 
cation will receive the final payoff for reaching its goal only after the very 
last action. Thus, payoff might be delayed. An appropriate objective is then 
to choose actions so that the sum of all future payoff is maximal. We will 
call this sum the cumulative payoff. Since the world is non-deterministic, the 
best one can optimize is the expected cumulative payoff, which is conveniently 
written as 


Rr = E 





T 
T. 


Here the expectation E| | is taken over future momentary payoff values rz+r 
which the robot might accrue between time t and time t + T. The individual 
payoffs r++- are multiplied by a factor 77, called the discount factor. The value 
of is a problem-specific parameter, and it is constrained to lie in the interval 
[0;1]. Ify = 1, we have 77 = 1 for arbitrary values of 7, and hence the 
factor can be omitted in Equation (14.4). Smaller values of 7 discount future 
payoff exponentially, making earlier payoffs exponentially more important 
than later ones. This discount factor, whose importance will be discussed 
later, bears resemblance to the value of money, which also loses value over 
time exponentially due to inflation. 

We notice that Rr is a sum of T time steps. T is called the planning horizon, 
or simply: horizon. We distinguish three important cases: 


1. T = 1. This is called the greedy case, where the robot only seeks to min- 
imize the immediate next payoff. While this approach is degenerate in 
that it does not capture the effect of actions beyond the immediate next 
time step, it nevertheless plays an important role in practice. The reason 
for its importance stems from the fact that greedy optimization is much 
simpler than multi-step optimization. In many robotics problems, greedy 
algorithms are currently the best known solutions that can be computed 
in polynomial time. Obviously, greedy optimization is invariant with re- 
spect to the discount factor y, but it requires that y > 0. 


2. T larger than 1 but finite. This case is known as the finite-horizon case. Typ- 
ically, the payoff is not discounted over time, thus 7 — 1. One might argue 
that the finite-horizon case is the only one that matters, since for all prac- 
tical purposes time is finite. However, finite-horizon optimality is often 
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harder to achieve than optimality in the discounted infinite-horizon case. 
Why is this so? A first insight stems from the observation that the optimal 
control action is a function of time horizon. Near the far end of the time 
horizon, for example, the optimal policy might differ substantially from 
the optimal choice earlier in time, even under otherwise identical condi- 
tions (e.g., same state, same belief). As a result, planning algorithms with 
finite horizon are forced to maintain different plans for different horizons, 
which can add undesired complexity. 


. T is infinite. This case is known as the infinite-horizon case. This case does 


not suffer the same problem as the finite horizon case, as the number of 
remaining time steps is the same for any point in time (it is infinite!). How- 
ever, here the discount factor y is essential. To see why, let us consider the 
case where we have two robot control programs, one that earns us $1 per 
hour, and another one that makes us $100 per hour. In the finite hori- 
zon case, the latter is clearly preferable to the former. No matter what 
the value of the horizon is, the expected cumulative payoff of the second 
program exceeds that of the first by a factor of a hundred. Not so in the 
infinite horizon case. Without discounting, both programs will earn us an 
infinite money, rendering the expected cumulative payoff Rr insufficient 
to select the better program. 


Under the assumption that each individual payoff r is bounded in mag- 
nitude (that is, |r] < Tmax for some value rmax), discounting guarantees 
that Ræ is finite—despite the fact that the sum has infinitely many terms. 
Specifically, we have 





Roo < Tmax max + Y max HY max t = 


This shows that Rə is finite as long as y is smaller than 1. As an aside, we 
note that an alternative to discounting involves maximizing the average 
payoff instead of the total payoff. Algorithms for maximizing average 
payoff will not be studied in this book. 


Sometimes we will refer to the cumulative payoff Rr conditioned on a state 
xı. This will be written as follows: 





T 
> Y Thes | Tt 
T-1 
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The cumulative payoff Rr is a function of the robot's policy for action selec- 
tion. Sometimes, it is beneficial to make this dependence explicit: 


T 
Ryl) = E XO Yrer | Ut4r = T(21:t+r—1; Ul:t+r—1) 

F=1 
This notation enables us to compare two control policies 7 and 7’, and deter- 
mine which one is better. Simply compare RẸ to RE and pick the algorithm 
with higher expected discounted future payoff! 


Finding Optimal Control Policies for the Fully Observable Case 


This chapter shall be concluded with a concreted value iteration algorithm, 
for calculating control policies in fully observable domains. At first glance, 
such algorithms depart from the basic assumption of probabilistic robotics, 
namely that state is not observable. However, in certain applications one can 
safely assume that the posterior p(x; | z14, u14) is well-represented by its 
mean E [p(z; | 21-4, u14)]. 

The fully observable case has some merit. The algorithm discussed in turn 
also prepares us for the more general case of partial observability. 

We already noted that the framework of stochastic environments with fully 
observable state is known as Markov decision processes. Policies, in MDPs, 
are mappings from state to control actions: 


T:r Uu 


The fact that the state is sufficient for determining the optimal control is a 
direct consequence of our Markov assumption that was discussed in length 
in Chapter 2.4.4. The goal of planning in the MDP framework is to identify 
the policy 7 that maximizes the future cumulative payoff. 

Let us begin with defining the optimal policy for a planning horizon of T = 
1, hence we are only interested in a policy that maximizes the immediate next 
payoff. This policy shall be denoted 7; (x) and is obtained by maximizing the 
expected 1-step payoff over all controls: 


m(x) = argmax r(z,u) 


Thus, an optimal action is one that maximizes the immediate next payoff in 
expectation. The policy that chooses such an action is optimal in expectation. 

Every policy has an associated value function, which measures the expected 
value (cumulative discounted future payoff) of this specific policy. For 71, 
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the value function is simply the expected immediate payoff, discounted by 
the factor ^: 


Vilx) = y max r(z,u) 


This value for longer planning horizons is now defined recursively. The op- 
timal policy for horizon T = 2 selects the control that maximizes the sum of 
the one-step optimal value V; (x) and the immediate 1-step payoff: 


uU 


m(z) = argmax [res Í Vi (z^) p(' | u, 2) aw! 


It should be immediately obvious why this policy is optimal. The value of 
this policy conditioned on the state x is given by the following discounted 
expression: 


Voz) = 4 max [resa | nane ten 24 


The optimal policy and its value function for T' — 2 was constructed recur- 
sively, from the optimal value function for T = 1. This observation suggests 
that for any finite horizon T the optimal policy, and its associated value func- 
tion, can be obtained recursively from the optimal policy and value function 
Tram. 


nrls) = argmax rou) + f e plz | u, x) a 


The resulting policy mr (x) is optimal for the planning horizon T. The associ- 
ated value function is defined through the following recursion: 


Vr(zr) = y max [rns u) 4 ] e p(x’ | u, 2) a] 
In the infinite horizon case, the optimal value function tends to reach the 


equilibrium (with the exception of some rare deterministic systems, in which 
no such equilibrium exists): 


Va(r) = Y max [rns u) 十 f «e p(x’ | uz) a 
This invariance is known as Bellman equation. Without proof, we notice that 


every value function V that satisfies the condition (14.15) is both necessary 
and sufficient for the induced policy to be optimal. 
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Computing the Value Function 


This consideration leads to the definition of a practical algorithm for calcu- 
lating the optimal policy in stochastic systems with full state observability. 
Value iteration does this by successively approximating the optimal value 
functions, as defined in (14.15). 

In detail, let us denote out value function approximation by V. Initially, V 
is set to rii, the minimum possible immediate payoff: 


V « Tmin 


Value iteration then successively updates the approximation via the follow- 
ing recursive rule, which computes the value function for increasing hori- 
zons: 


Va) — ymax bent [vepe baa de! 


Since each update propagates information in reverse temporal order through 
the value function, it is usually referred to as the backup step. 

The value iteration rule bears close resemblance to the calculation of the 
horizon-T optimal policy above. Value iteration converges if y < 1 and, in 
some special cases, even for y = 1. The order in which states are updated in 
value iteration is irrelevant, as long as each state is updated infinitely often. 
In practice, convergence is usually observed after a much smaller number of 
iterations. 

At any point in time, the value function V (4) defines a policy: 


n(x) = argmax [rs + i V (a^) p(x | u, £) da! 


u 
After convergence of value iteration, the policy that is greedy with respect to 
the final value function is optimal. 

We note that all of these equations have been formulated for general state 
spaces. For finite state spaces, the integral in each of these equations can be 
implemented as a finite sum over all states. This sum can often be calculated 
efficiently, since p(x’ | u, x) will usually be non-zero for a relatively few states 
x and x’. This leads to an efficient family of algorithms for calculating value 
functions. 

Table 14.1 shows three algorithms: The general value iteration algo- 
rithm MDP value iteration for arbitrary state and action spaces; its discrete 
variant for finite state spaces MDP discrete value iteration, and the algo- 
rithm for retrieving the optimal control action from the value function, pol- 
icy MDP. 
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1: Algorithm MDP value iteration( ): 
2: for all x do 
3: V (a) =fmin 
4: endfor 
5: repeat until convergence 
6: for all x 
7: V(x) = max [res] + [ve p(x’ | u, x) da’ 
8: endfor 
9: endrepeat 
10: return V 
1: Algorithm MDP discrete value iteration( ): 
2: fori = 1 to N do 
3: V (25) — fT'inin 
4: endfor 
5: repeat until convergence 
6: fori = 1 to N do 
A N ^ 
7: P (s) =y max |r(5 2) + 92 Plaz) ple; | usas) 
j=l 
8: endfor 
9: endrepeat 
10: return V 
1: Algorithm policy_MDP(z, V): 
N A 
2: return argmax |r(z,u) + 5 V(a;) p(x; | u, xi) 
u j=l 











Table 14.1 The value iteration algorithm for MDPs, stated here in its most general 
form and for MDPs with finite state and control spaces. The bottom algorithm com- 
putes the best control action. 
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Figure 14.4 An example of an infinite-horizon value function Tə, assuming that the 
goal state is an "absorbing state." This value function induces the policy shown in 
Figure 14.2a. 


The first algorithm MDP value iteration initializes the value function in 
line 3. Line 5 through 9 implement the recursive calculation of the value 
function. Once value iteration converges, the resulting value function V in- 
duces the optimal policy. If the state space is finite, the integral is replaced 
by a finite sum, as shown in MDP discrete value iteration. The factor 7 is 
the discount factor. The algorithm policy MDP processes the optimal value 
function along with a state x, and returns the control u that maximizes the 
expected value. 

Figure 14.4 depicts an example value function, for our example discussed 
above. Here the shading of each grid cell corresponds to its value, with white 
being V — 100 and black being V — 0. Hill climbing in this value function 
using Equation 14.18 leads to the policy shown in Figure 142a. 


Application to Robot Control 


The simple value iteration algorithm is applicable to low-dimensional robot 
motion planning and control problems. To do so, we have to introduce two 
approximations. 

First, the algorithm in Table 14.5 defines a value function over a continuous 
space, and requires maximization and integration over a continuous space. 
In practice, it is common to approximate the state space by a discrete decom- 
position, similar to our histogram representations in Chapter 4.1. Similarly, 
itis common to discretize the control space. The function V is then easily im- 
plemented as a look-up table. However, such a decomposition works only 
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(b) 























Figure 14.5 Example of value iteration over state spaces in robot motion. Obstacles 
are shown in black. The value function is indicated by the gray shaded area. Greedy 
action selection with respect to the value function lead to optimal control, assuming 
that the robot’s pose is observable. Also shown in the diagrams are example paths 
obtained by following the greedy policy. 


for low-dimensional state and control spaces, due to the curse of dimension- 
ality. In higher dimensional situations, it is common to introduce learning 
algorithms to represent the value function. 

Second, we need a state! As noted above, it might be viable to replace the 
posterior by its mode 


Tt = 五 [p(zz | z14, U1:4)] 


In the context of robot localization, for example, such an approximation 
works well if we can guarantee that the robot is always approximately local- 
ized, and the residual uncertainty in the posterior is local. It ceases to work 
when the robot performs global localization, or if it is being kidnapped. 

Figure 14.5 illustrates value iteration in the context of a robotic path plan- 
ning problem. Shown there is a two-dimensional projection of a configu- 
ration space of a circular robot. The configuration space is the space of all 
(x,y, 0) coordinates that the robot can physically attain. For circular robots, 
the configuration space is obtained by “growing” the obstacles in the map 
by the radius of the robot. These increased obstacles shown in black in Fig- 
ure 14.5. 

The value function is shown in gray, where the brighter a location, the 
higher its value. The path obtained by following the optimal policy leads to 
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Figure 14.6 (a) 2-DOF robot arm in an environment with obstacles. (b) The configu- 
ration space of this arm: the horizontal axis corresponds to the shoulder joint, and the 
vertical axis to its elbow joint configuration. Obstacles are shown in gray. The small 
dot in this diagram corresponds to the configuration on the left. 


the respective goal location, as indicated in Figure 14.5. The key observation 
is that the value function is defined over the entire state space, enabling the 
robot to select an action no matter where it is. This is important in non- 
deterministic worlds, where actions have stochastic effects on the robot’s 
state. 

The path planner that generated Figure 14.5 makes specific assumptions 
in order to keep the computational load manageable. For circular robots 
that can turn on the spot, it is common to compute the value function in the 
two-dimensional Cartesian coordinates only, ignoring the cost of rotation. 
We also ignore state variables such as the robot's velocity, despite the fact 
that velocity clearly constrains where a robot can move at any given point in 
time. To turn such a control policy into actual robot controls, it is therefore 
common practice to combine such path planners with fast, reactive collision 
avoidance modules that generate motor velocities while obeying dynamic 
constraints. A path planner that considers the full robot state would have to 
plan in at least five dimensions, comprising the full pose (three dimensions), 
the translational and the rotational velocity of the robot. In two dimensions, 
calculating the value function for environment like the one above takes only 
a fraction of a second on a low-end PC. 
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Figure 14.7 (a) Value iteration applied to a coarse discretization of the configura- 
tion space. (b) Path in workspace coordinates. The robot indeed avoids the vertical 
obstacle. 


bf 





Figure 14.8 (a) Probabilistic value iteration, here over a fine-grained grid. (b) The 
corresponding path. 
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A second example is illustrated in Figure 14.6a. Shown there is a robot 
arm model with two rotational degrees of freedom (DOFs), a shoulder and 
an elbow joint. Determining the exact configuration of these joints is usu- 
ally possible, through shaft encoders that are attached to the joints. Hence, 
the approximation in (14.19) is a valid one. However, robot arm motion is 
usually subject to noise, and as such the control noise should be taken into 
account during planning. This makes arm control a primary application for 
probabilistic MDP-style algorithms. 

Robot arm motion is usually tackled in the configuration space. The configu- 
ration space for the specific arm is shown in Figure 14.6b. Here the horizontal 
axis graphs the orientation of the shoulder, and the vertical orientation that of 
the elbow. Each point in this diagram, thus, corresponds to a specific configu- 
ration. In fact, the small dot in Figure 14.6b corresponds to the configuration 
shown in Figure 14.6a. 

It is common to decompose the configuration space into areas in which 
the robot can move, and areas where it would collide. This is shown in 
Figure 14.6b. The white area in this figure corresponds to the collision-free 
configuration space, commonly called freespace. The black boundary in the 
configuration space is the constraint imposed by the table and the enclosing 
case. The vertical obstacle, protruding into the robot’s workspace from above 
in Figure 14.6a, corresponds to the light gray obstacle in the center of Fig- 
ure 14.6b. This figure is not at all obvious, and the reader may take a minute 
to visualize configurations in which the robot collides with this obstacle. 

Figure 14.7a shows the result of value iteration using a coarse discretiza- 
tion of the configurations space. Here the value is propagated using a deter- 
ministic motion model and the resulting path is also shown. When executed, 
this policy leads to a motion shown in Figure 14.7b. Figure 14.8 shows a re- 
sult for a probabilistic motion model, with the resulting motion of the arm. 
Again, this is the result of applying value iteration under the assumption 
that the configuration of the robot arm is fully observable—a rare instance in 
which this assumption is valid! 


Summary 


This chapter introduced the basic framework of probabilistic control. 


* We identified the two basic types of uncertainty a robot may face: uncer- 
tainty with regards to controls, and uncertainty in perception. The former 
makes it difficult to determine what lies ahead, whereas the latter makes 
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it hard to determine what is. Uncertainty from unpredictable events in 
the environment was silently subsumed into this taxonomy. 


* We defined the control objective through a payoff function, which maps 
states and controls to a goodness value. The payoff enables us to express 
performance goals as well as costs of robot operation. The overall con- 
trol objective is the maximization of all payoff, immediate and at future 
points in time. To avoid possibly infinite sums, we introduced a so-called 
discount factor that exponentially discounts future payoffs. 


* We discussed an approach to solving probabilistic control problems by 
devising a control policy. A control policy defines the control action that 
is to be chosen, as a function of the robot's information about the world. A 
policy is optimal if it maximizes the sum of all future cumulative payoffs. 
The policy is computed in a planning phase that precedes robot operation. 
Once computed, it specifies the optimal control action for any possible 
situation a robot may encounter. 


* We devised a concrete algorithm for finding the optimal control policy for 
the restricted case of fully observable domains, in which the state is fully 
observable. Those domains are known as Markov decision processes. 
The algorithm involved the calculation of a value function, which mea- 
sures the expected cumulative payoff. A value function defines a policy 
of greedily selecting the control that maximizes value. If the value func- 
tion is optimal, so is the policy. The value iteration algorithm successively 
improved the value function by updating it recursively. 


* We discussed applications of MDP value iteration to probabilistic robotics 
problems. For that we extracted the mode of the belief as the state, and ap- 
proximated the value function by a low-dimensional grid. The result was 
a motion planning algorithm for stochastic environments, which enables 
a robot to navigate even if its action effects are nondeterministic. 


The material in this chapter lays the groundwork for the next chapter, in 
which we will tackle the more general problem of control under measure- 
ment uncertainty, also known as the partially observable MDP problem. We 
already intuitively discussed why this problem is much harder than the MDP 
problem. Nevertheless, some of the intuitions and basic algorithms carry 
over to this case. 

We close this chapter by remarking that there exist a number of alterna- 
tive techniques for probabilistic planning and control under uncertainty. Our 
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choice of value iteration as the basic method is due to its popularity; further, 
value iteration techniques are among the best-understood techniques for the 
more general POMDP case. 

Value iteration is by no means the most effective algorithm for generating 
control. Common planning algorithms include the A* algorithm, which uses 
a heuristic in the computation of the value function, or direct policy search 
techniques that identify a locally optimal policy through gradient descent. 
However, value iteration plays a pivotal role in the next chapter, when we 
address the much harder case of optimal control under sensor uncertainty. 


Bibliographical Remarks 


The idea of dynamic programming goes back to Bellman (1957) and Howard (1960). Bellman 
(1957) identified the equilibrium equation for value iteration, which has henceforth been called 
the Bellman equation. Markov decision processes (MDPs) with incomplete state estimation were 
first discussed by Astrom (1965); see also Mine and Osaki (1970) for early work on MDPs. Since 
then, dynamic programming for control has been a vast field, as a recent book on this topic 
attests (Bertsekas and Tsitsiklis 1996). Recent improvements to the basic paradigm include tech- 
niques for real-time value iteration (Korf 1988), value iteration guided through the interaction 
with an environment (Barto et al. 1991), model free value iteration (Watkins 1989), and value 
iteration with parametric representation of the value function (Roy and Tsitsiklis 1996; Gor- 
don 1995), or using trees (Moore 1991) (see also (Mahadevan and Kaelbling 1996)). Hierarchical 
value iteration techniques have been developed by Parr and Russell (1998) and Dietterich (2000), 
and Boutilier et al. (1998) improved the efficiency of MDP value iteration by reachability anal- 
ysis. There exists also a rich literature on applications of value iteration, for example work by 
Barniv (1990) on moving target detection. In the light of this rich literature, the material in this 
chapter is a basic exposition of the most simple value iteration techniques, in preparation of the 
techniques described in the chapters that follow. 

Within robotics, the issue of robot motion planning has typically been investigated in a non- 
probabilistic framework. As noted, the assumption is usually that the robot and its environ- 
ment is perfectly known, and controls have deterministic effects. Complications arise from the 
fact that the state space is continuous and high-dimensional. The standard text in this field is 
Latombe (1991). It is predated by seminal work on a number of basic motion planning tech- 
niques, visibility graphs (Wesley and Lozano-Perez 1979), potential field control (Khatib 1986), 
and Canny’s (1987) famous silhouette algorithm. Rowat (1979) introduced the idea of Voronoi 
graphs into the field of robot control, and Guibas et al. (1992) showed how to compute them 
efficiently. Choset (1996) developed this paradigm into a family of efficient online exploration 
and mapping techniques. Another set of techniques used randomized (but not probabilistic!) 
techniques for searching the space of possible robot paths (Kavraki and Latombe 1994). A con- 
temporary book on this topic is due to Choset et al. (2004). 

In the language of the robotic motion planning field, the methods discussed in this chapter 
are approximate cell decompositions which, in the deterministic case, provide no guarantee of com- 
pleteness. Decompositions of continuous spaces into finite graphs have been studied in robotics 
for decades. Reif (1979) developed a number of techniques for decomposing continuous spaces 
into finitely many cells that retained completeness in motion planning. The idea of configuration 
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spaces, necessary to check collision with the techniques described in this chapter, was originally 
proposed by Lozano-Perez (1983). Recursive cell decomposition techniques for configuration 
space planning can be found in Brooks and Lozano-Perez (1985), all under the non-probabilistic 
assumptions of perfect world models and perfect robots. Acting under partial knowledge has 
been addressed by Goldberg (1993), who in his Ph.D. thesis developed algorithms for orienting 
parts in the absence of sensors. Sharma (1992) developed robot path planning techniques with 
stochastic obstacles. 

Policy functions that assign to every possible robot state a control action are known as con- 
trollers. An algorithm for devising a control policy is often called optimal controller (Bryson and 
Yu-Chi 1975). Control policies are also known as navigation function (Koditschek 1987; Rimon 
and Koditschek 1992). Within AI, they are known as universal plans (Schoppers 1987), and a 
number of symbolic planning algorithms have addressed the problem of finding such universal 
plans (Dean et al. 1995; Kushmerick et al. 1995; Hoey et al. 1999). Some robotic work has been 
devoted to bridging the gap between universal plans and open-loop action sequences, such as 
in Nourbakhsh’s (1987) Ph.D. work. 

We also remark that the notion of “control” in this book is somewhat narrowly defined. The 
chapter deliberately did not address standard techniques in the rich field of control, such as PID 
control and other popular techniques often found in introductory textbooks (Dorf and Bishop 
2001). Clearly, such techniques are both necessary and applicable in many real-world robot 
systems. Our choice to omit them is based on space constraints, and on the fact that most of 
these techniques do not rely on explicit representations of uncertainty. 


Exercises 


1. The dynamic programming algorithm uses the most likely state to deter- 
mine its action. Can you draw a robot environment in which conditioning 
actions on the most likely state is fundamentally the wrong choice? Can 
you give a concise reason why this might sometimes be a poor choice? 


2. Suppose we run value iteration to completion, for a fixed cost function. 
Then the cost function changes. We would now like to adjust the value 
function by further iterations of the algorithm, using the previous value 
function as a starting point. 


(a) Is this a good or a bad idea? Does your answer depend on whether the 
cost increased or decreased? 


(b) Can you flesh out an algorithm that would be more efficient than sim- 
ply continuing value iteration after the cost changes? If you can, argue 
why your algorithm is more efficient. If not, argue why no such algo- 
rithm may exist. 


3. Heaven or Hell? In this exercise, you are asked to extend dynamic pro- 
gramming to an environment with a single hidden state variable. The 
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environment is a maze with a designated start marked “S,” and two pos- 
sible goal states, both marked "H." 





What the agent does not know is which of the two goal states provides a 
positive reward. One will give 4-100, whereas the other will give —100. 
There is a .5 probability that either of those situations is true. The cost 
of moving is —1; the agent can only move into the four directions north, 
south, east, and west. Once a state labeled "H" has been reached, the play 
is over. 


(a) Implement a value iteration algorithm for this scenario (and ignore the 
label ^X" in the figure). Have your implementation compute the value 
of the starting state. What is the optimal policy? 


(b) Modify your value algorithm to accommodate a probabilistic motion 
model: with .9 chance the agent moves as desired; with .1 chance it 
will select any of the three other directions at random. Run your value 
iteration algorithm again, and compute both the value of the starting 
state, and the optimal policy. 


(c) Now suppose the location labeled "X" contains a sign that informs the 
agent of the correct assignment of rewards to the two states labeled 
^H." How does this affect the optimal policy? 


(d) How can you modify your value iteration algorithm to find the optimal 
policy? Be concise. State any modifications to the space over which the 
value function is defined. 


(e) Implement your modification, and compute both the value of the start- 
ing state and the optimal policy. 
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Partially Observable Markov 
Decision Processes 


Motivation 


This chapter discusses algorithms for the partially observable robot control 
problem. These algorithms address both the uncertainty in measurement 
and the uncertainty in control effects. They generalize the value iteration al- 
gorithm discussed in the previous chapter, which was restricted to action ef- 
fect uncertainty. The framework studied here is known as partially observable 
Markov decision processes, or POMDPs. This name was coined in the opera- 
tions research literature. The term partial indicates that the state of the world 
cannot be sensed directly. Instead, the measurements received by the robot 
are incomplete and usually noisy projections of this state. 

As has been discussed in so many chapters of this book, partial observ- 
ability implies that the robot has to estimate a posterior distribution over 
possible world states. Algorithms for finding the optimal control policy exist 
for finite worlds, where the state space, the action space, the space of obser- 
vations, and the planning horizon T are all finite. Unfortunately, these exact 
methods are computationally involved. For the more interesting continuous 
case, the best known algorithms are approximate. 

All algorithms studied in this chapter build on the value iteration ap- 
proach discussed previously. Let us restate Equation (14.14), which is the 
central update equation in value iteration in MDPs: 


Vr(r) = y max [resa «f viae p(x | u, £) da’ 
with Vi(z) = ymax, r(z,u). In POMDPs, we apply the very same idea. 


However, the state x it not observable. The robot has to make its decision 
in the belief state, which is the space of posterior distributions over states. 
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Throughout this and the next chapters, we will abbreviate a belief by the 
symbol b, instead of the more elaborate bel used in previous chapters. 
POMDPs compute a value function over belief space: 


Vr(b) = ymax K u) + f vr) p(b' | u,b) ar 
with Vi (b) = ymax, E;[r(x, u)]. The induced control policy is as follows: 


wr(b) = argmax K + [ vie pb | u,b) ar 
K3 

A belief is a probability distribution; thus, each value in a POMDP is a func- 
tion of an entire probability distribution. This is problematic. If the state 
space is finite, the belief space is continuous, since it is the space of all distri- 
butions over the state space. Thus, there is a continuum of different values; 
whereas there was only a finite number of different values in the MDP case. 
The situation is even more delicate for continuous state spaces, where the 
belief space is an infinitely-dimensional continuum. 

An additional complication arises from the computational properties of 
the value function calculation. Equations (15.2) and (15.3) integrate over all 
beliefs b’. Given the complex nature of the belief space, it is not at all obvious 
that the integration can be carried out exactly, or that effective approxima- 
tions can be found. It should therefore come at no surprise that calculating 
the value function Vr is more complicated in belief space than it is in state 
space. 

Luckily, an exact solution exists for the interesting special case of finite 
worlds, in which the state space, the action space, the space of observations, 
and the planning horizon are all finite. This solution represents value func- 
tions by piecewise linear functions over the belief space. As we shall see, the 
linearity of this representation arises directly from the fact that the expecta- 
tion is a linear operator. The piecewise nature is the result of the fact that 
the robot has the ability to select controls, and in different parts of the belief 
space it will select different controls. All these statements will be proven in 
this chapter. 

This chapter discusses the general POMDP algorithm for calculating poli- 
cies defined over the space of all belief distributions. This algorithm is com- 
putationally cumbersome but correct for finite POMDPs; although a variant 
will be discussed that is highly tractable. The subsequent chapter will dis- 
cuss a number of more efficient POMDP algorithms, which are approximate 
but scale to actual robotics problems. 
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Figure 15.1 The two-state environment used to illustrate value iteration in belief 
space. 


An Illustrative Example 


Setup 


We illustrate value iteration in belief spaces through a numerical example. 
This example is simplistic, but by discussing it we identify all major elements 
of value iteration in belief spaces. 

Our example is the two-state world in Figure 15.1. The states are labeled 
zı and x2. The robot can choose among three different control actions, u1, 
uz, and ua. Actions u; and uz are terminal: When executed, they result in the 
following immediate payoff: 


r(v1,u1) = 一 100 r(a2,u1) = +100 

r(a1,U2) = +100 r(@2,U2) = 一 50 

The dilemma is that both actions provide opposite payoffs in each of the 
states. Specifically, when in state 71, uz is the optimal action, whereas it is u1 
in state x2. Thus, knowledge of the state translates directly into payoff when 
selecting the optimal action. 


To acquire such knowledge, the robot is provided with a third control ac- 
tion, u3. Executing this control comes at a mild cost of —1: 


r(v1,U3) = r(z2,u3) = 一 | 
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One might think of this as the cost of waiting, or the cost of sensing. Action 
u3 affects the state of the world in a non-deterministic manner: 


p(xj|xi,u3) = 0.2 p(x5|xi,ua) = 0.8 
p(x) |r2,u3) = 0.8 p(x5|r2,u3) = 0.2 


In other words, when the robot executes us, the state flips to the respective 
other state with 0.8 probability, and the robot pays a unit cost. 

Nevertheless, there is a benefit to executing action uz. Before each control 
decision, the robot can sense. By sensing, the robot gains knowledge about 
the state, and in turn it can make a better control decision that leads to higher 
payoff in expectation. The action ug lets the robot sense without committing 
to a terminal action. 

In our example, the measurement model is governed by the following 
probability distribution: 


p(alzi) = 0.7 p(z|mi) = 0.3 
plzıilz2) = 0.3 p(22|x2) 0.7 


Put differently, if the robot measures 21 its confidence increases for being in 
zı, and the same is the case for zx relative to x2. 

The reason for selecting a two-state example is that it makes it easy to 
graph functions over the belief space. In particular, a belief state b is charac- 
terized by p; = b(r4) and pz = b(r3). However, we know pa = 1 — pı, hence 
it suffices to graph pı. The corresponding control policy 7 is a function that 
maps the unit interval [0; 1] to the space of all actions: 


7:[06;1] — u 


Control Choice 


In determining when to execute what control, let us start our consideration 
with the immediate payoff for each of the three control choices, u1, u2, and 
uz. In the previous chapter, payoff was considered a function of state and 
actions. Since we do not know the state, we have to generalize the notion 
of a payoff to accommodate belief state. Specifically, for any given belief 
b = (pı, p2), the expected payoff under this belief is given by the following 
expectation: 


r(bu) = E,[r(v,u)] = p r(ziu) + po r(zz,u) 
The function r(b, u) defines the payoff in POMDPs. 
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Figure 15.2 Diagrams (a), (b), and (c) depict the expected payoff r as a function of 
the belief state parameter pı = b(z1), for each of the three actions u1, u», and us. 
(d) The value function at horizon T' — 1 corresponds to the maximum of these three 
linear functions. 


Figure 15.2a graphs the expected payoff r(b, u1) for control choice u1, pa- 
rameterized by the parameter pı. On the leftmost end of this diagram, we 
have p; = 0, hence the robot believes the world to be in state zz with absolute 
confidence. Executing action u; hence yields r(x2, ui) = 100, as specified in 
Equation (15.4). On the rightmost end, we have p; = 1, hence the state is x1. 
Consequently, control choice u; will result in r(z1, u1) = —100. In between, 
the expectation provides a linear combination of these two values: 


r(b,u1) = —100 pı +100p> = —100 pı + 100 (1— pi) 


This function is graphed in Figure 15.2a. 
Figures Figure 15.2b&c show the corresponding functions for action wz and 
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ua, respectively. For u», we obtain 

r(b,u2) = 100p — 50 (1 — pı) 

and for u3 we obtain the constant function 
r(bus) = -1p-—1l(l—pi) = -1 


Our first exercise in understanding value iteration in belief spaces will focus 
on the computation of the value function Vi, which is the value function that 
is optimal with regards to horizon T' — 1 decision processes. Within a single 
decision cycle, our robot can choose among its three different control choices. 
So which one should it choose? 

The answer is easily read off the diagrams studied thus far. For any be- 
lief state pı, the diagrams in Figures 15.2a-c graph the expected payoff for 
each of the action choices. Since the goal is to maximize payoff, the robot 
simply selects the action of highest expected payoff. This is visualized in 
Figure 15.2d: This diagram superimposes all three expected payoff graphs. 
In the left region, u1 is the optimal action, hence its value function dominates. 
The transition occurs when r(b, u1) = r(b, uz), which resolves to p; = 2. For 
values p; larger than 2, u» will be the better action. Thus the (T' — 1)-optimal 
policy is 


uj, ifp <2 
m(b) = 
u2 if pı >3 


The corresponding value is then the thick upper graph in Figure 15.2d. This 
graph is a piecewise linear, convex function. It is the maximum of the indi- 
vidual payoff functions in Figures 15.2a-c. Thus, we can write it as a maxi- 
mum over 3 functions: 


Vi(b = max r(b,u) 
—100p, +100 (1 — pi) (*) 
= max 100p;  —50(1— pi) (*) 


—1 


Obviously, only the linear functions marked (*) in (15.17) contribute. The 
remaining linear function can safely be pruned away: 


—100 pı +100 ecd 


vp c max f 100p  —50(1— pi) 
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We will use this pruning trick repeatedly in our example. Prunable linear 
constraints are shown as dashed lines in Figure 15.2d and many graphs to 
follow. 


Sensing 


The next step in our reasoning involves perception. What if the robot can 
sense before it chooses its control? How does this affect the optimal value 
function? Obviously, sensing provides information about the state, hence 
should enable the robot to choose a better control action. Specifically, for the 
worst possible belief thus far, p; = 2, the expected payoff in our example 
was 100 & 14.3, which is the value at the kink in Figure 15.2d. Clearly, if we 
can sense first, we find ourselves in a different belief after sensing. The value 
of this belief will be better than 14.3, but by how much? 

The answer is surprising. Suppose we sense z1. Figure 15.3a shows the 
belief after sensing zı as a function of the belief before sensing. Let us dissect 
this function. If our pre-sensing belief is pj = 0, our post-sensing belief is 
also p; = 0, regardless of the measurement. Similarly for p; = 1. Hence, at 
the extreme ends, this function is the identity. In between, we are uncertain 
as to what the state of the world is, and measuring zı does shift our belief. 
The amount it shifts is governed by Bayes rule: 


p, = p(m|z) 
E p(a | 21) p(21) 
p(z1) 
e 0.7 pi 
| pa) 
and 
" 0.3 (1 = pi) 
SR p(z1) 


The normalizer p(z,) adds the non-linearity in Figure 15.3a. In our example, 
it resolves to 


pia) = 07p-03(1—pi)) = 0.4p1 - 0.3 


and hence p, = e However, as we shall see below, this normalizer 


nicely cancels out. More on this in a minute. 
Let us first study the effect of this non-linear transfer function on the 
value function Vi. Suppose we know that we observed z1, and then have 
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Figure 15.3 The effect of sensing on the value function: (a) The belief after sensing 
zı as a function of the belief before sensing z1. Sensing zı makes the robot more con- 
fident that the state is xı. Projecting the value function in (b) through this nonlinear 
function results in the non-linear value function in (c). (d) Dividing this value func- 
tion by the probability of observing z1 results in a piecewise linear function. (e) The 
same piecewise linear function for measurement 22. (f) The expected value function 
after sensing. 
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to make an action choice. What would that choice be, and what would the 
corresponding value function look like? The answer is given graphically in 
Figure 15.3c. This figure depicts the piecewise linear value function in Fig- 
ure 15.3b, mapped through the nonlinear measurement function discussed 
above (and shown in Figure 15.3a). The reader may take a moment to get 
oriented here: Take a belief pı, map it to a corresponding belief p; according 
to our non-linear function, and then read off its value in Figure 15.3b. This 
procedure, for all p; € [0; 1], leads to the graph in Figure 15.3c. 
Mathematically, this graph is given by 


0.7 0.3 (1—p1) 
-100. $25 7100. “Se 





Vi(b| z1) = max 


100 - 0.7 pi 50 - 0.3 (1—p1) 


pa) 


= Ex Duns e MI 
| plz) 70p, —15(1—p) 





which is simply the result of replacing p; by p! in the value function V; spec- 
ified in (15.18). We note in Figure 15.3c that the belief of ^worst" value has 
shifted to the left. Now the worst belief is the one that, after sensing 21, 
makes us believe with 3 probability we are in state x1. 

However, this is the consideration for one of the two measurements only, 
the value before sensing has to take both measurements into account. Specif- 
ically, the value before sensing, denoted Vj, is given by the following expec- 
tation: 

2 
Vi(b) = EW(b|z) 25 (zi) VA (b | zi) 


We immediately notice that in this expectation, each contributing value func- 
tion Vi (b | z;) is multiplied by the probability p(z;), which was the cause of 
the nonlinearity in the pre-measurement value function. Plugging (15.19) 
into this expression gives us 


Aw) = Y play v RE m 


me p(zi) 
一 3 p(zi) X3 Vi(p(zi | x1) pı) 
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This transformation is true because each element in Vj is linear in 1/p(z;), 
as illustrated by example in (15.22). There we were able to move the factor 
1/p(z;) out of the maximization, since each term in the maximization is a 
product of this factor. After restoring the terms accordingly, the term p(z;) 
simply cancels out! 

In our example, we have two measurements, hence we can compute the 
expectation p(z;) Vi(b | z;) for each of these measurements. The reader may 
recall that these terms are added in the expectation (15.23). For z1, we already 
computed V. (b | z;) in (15.22), hence 


—70 pı +30 (1— pı) } 


p(z)Vi(b|z) = max{ 70p, -15 (1 — pı) 


This function is shown in Figure 15.3d: It is indeed the maximum of two 
linear functions. Similarly, for z2 we obtain 


—30 pı +70 (1— pı) \ 


p(z2) VA (b | z2) ST max { 30 pı 一 35 (1 — pı) 


This function is depicted in Figure 15.3e. 
The desired value function before sensing is then obtained by adding those 
two terms, according to Equation (15.23): 


—70 pı -30(1— pi) \ 4 mas { —30p, +70 (1— pı) ) 


We oc max | 70p, —15(1— pi) 30 p, —35(1— pi) 


This sum is shown in Figure 15.3f. It has a remarkable shape: Instead of a sin- 
gle kink, it possesses two different kinks, separating the value function into 
three different linear segments. For the left segment, u: is the optimal action, 
no matter what additional information the robot may reap through future 
sensing. Similarly for the right segment, uz is the optimal control action no 
matter what. In the center region, however, sensing matters. The optimal 
action is determined by what the robot senses. In doing so, the center seg- 
ment defines a value that is significantly higher than the corresponding value 
without sensing, shown in Figure 15.2d. Essentially, the ability to sense lifted 
an entire region in the value function to a higher level, in the region where 
the robot was least certain about the state of the world. This remarkable find- 
ing shows that value iteration in belief space indeed values sensing, but only 
to the extent that it matters for future control choices. 

Let us return to computing this value function, since it may appear easier 
than it is. Equation (15.27) requires us to compute the sum of two maxima 
over linear functions. Bringing this into our canonical form—which is the 
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maximum over linear functions without the sum—requires some thought. 
Specifically, our new value function V; will be bounded below by any sum 
that adds a linear function from the first max-expression to a linear function 
from the second max-expression. This leaves us with four possible combina- 
tions: 











—70 pı 430(1—p1) —30p, +70 (1—p1) 
= i —70 pı 430(1—p1) 430p —35(1— pı) 
NE 70p, —15(1—p1) —30p, +70 (1- pı) 

70 pi —15 (1 — p) +30 pi 一 35 (1 — pi) 

一 100 pı +100 (1 — pı) (x) 

—40 pı —5 (1 — pı) 
= max 
40 pı +55 (1-— pı) (*) 
100 pı  —50(1— pı) (=) 
~100 pı +100 (1— p) 
= max 40 pı +55(1— =P) 


Once again, we use (*) to denote constraints that actually contribute to the 
definition of the value function. As shown in Figure 15.3f, only three of these 
four linear functions are required, and the fourth can safely be pruned away. 


Prediction 


Our final step concerns state transitions. When the robot selects an action, its 
state changes. To plan at a horizon larger than T = 1, we have to take this into 
consideration and project our value function accordingly. In our example, u1 
and uz are both terminal actions. Thus, we only have to consider the effect 
of action ua. 

Luckily, state transitions are not anywhere as intricate as measurements 
in POMDPs. Figure 15.4a shows mapping of the belief upon executing us. 
Specifically, suppose we start out in state Z1 with absolute certainty, hence 
pı = 1. Then according to our transition probability model in Equation 
(15.7), we have p, = p(zj|xi,ua) = 0.2. Similarly, for pı = 0 we get 
p, = p(x |v2, us) = 0.8. In between the expectation is linear: 


p, = Eslp(zile, us)] 


cw p (xili us) p 


i=l 
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Figure 15.4 (a) The belief state parameter p! after executing action ua, as a function 
of the parameter p; before the action. Propagating the belief shown in (b) through 
the inverse of this mapping results in the belief shown in (c). (d) The value function 
V2 obtained by maximizing the propagated belief function, and the payoff of the two 
remaining actions, ui and uz. 


= 02p 408(l—pi)) = 08—06p 


This is the function graphed in Figure 15.4a. If we now back-project the 
value function in Figure 15.4b—which is equivalent to the one shown in Fig- 
ure 15.3f—we obtain the value function in Figure 15.4c. This value function 
is flatter than the one before the projection step, reflecting the loss of infor- 
mation through the state transition. It is also mirrored, since in expectation 
the state changes when executing us. 

Mathematically, this value function is computed by projecting (15.28) 
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through (15.29). 
一 100 (0.8 — 0.6 pı) +100 (1 — (0.8 — 0.6 p1)) 
Vi(b| u3) = max 40 (0.8 — 0.6 pı) +55 (1 — (0.8 — 0.6 p1)) 
100 (0.8 — 0.6 pı) —50 (1 — (0.8 — 0.6 p1)) 
—100 (0.8 — 0.6 pı) +100 (0.2 + 0.6 pı) 
— max 40(0.8— 0.6 pı) +55 (0.2 + 0.6 pı) 
100 (0.8 — 0.6 pı) | —50 (0.2 + 0.6 pı) 





60 pı —60 (1 — pı) 
52 pı +43 (1-— pı) 


一 20 pi 十 70 (1 = pi) 


— max | 


These transformations are easily checked by hand. Figure 15.4c shows this 
function, along with the optimal control actions. 

We have now almost completed the vale function V2 with a planning hori- 
zon of T = 2. Once again, the robot is given a choice whether to execute the 
control us, or to directly engage in any of the terminal actions u1 or u». As 
before, this choice is implemented by adding two new options to our consid- 
eration, in the form of the two linear functions r(b, u1) and r(b, u2). We also 
must subtract the cost of executing action us from the value function. 

This leads to the diagram in Figure 15.4d, which is of the form 





—100p, +100 (1 — pı) (x) 
100 pı  —50(1— pı) (*) 

V2(b) = max 59p; —61 (1-— pı) 
5l pı +42 (1-— pı) (x) 

—21pi +69 (1 -— pı) 





Notice that we simply added the two options (lines one and two), and sub- 
tracted the uniform cost of u3 from all other linear constraints (lines three 
through five). Once again, only three of those constraints are needed, as 
indicated by the («)’s. The resulting value can thus be rewritten as 


—100 pı +100 (1 — pı) 
Vo(b) = max 100 pı  —50(1— pı) 
51 pı +42 (1-— pı) 
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Figure 15.5 The value function V for horizons T = 10 and T = 20. Note that the 
vertical axis in these plots differs in scale from previous depictions of value functions. 


Deep Horizons and Pruning 


We have now executed a full backup step in belief space. This algorithm is 
easily recursed. Figure 15.5 shows the value function at horizon T = 10 and 
T = 20, respectively. Both of these value functions are seemingly similar. 
With appropriate pruning, V29 has only 13 components 


—100 pı --100 (1 
100 pı —50 (1 

64.1512 pı 65.9454 (1 

64.1513 pı +65.9454 (1 

64.1531 pl +65.9442 (1 

68.7968 pı +62.0658 (1 

Voo(b) = max 68.7968 p, +62.0658 (1 
69.0914 pı +61.5714 (1 

68.8167 pı +62.0439 (1 

69.0369 pı +61.6779 (1 

41.7249 p. +76.5944 ( 

39.8427 pı +77.1759 ( 

39.8334 pı +77.1786 (1 


— pı) 
— pı) 
— pı) 
— pı) 
— pı) 
— pı) 
— p) 
— pı) 
— p) 
— p) 
1—p) 
1—p) 
= pı) 








We recognize the two familiar linear functions on the top; all others corre- 
spond to specific sequences of measurements and action choices. 

As simple consideration shows that pruning is of essence. Without prun- 
ing, each update brings two new linear constraints (action choice), and then 
squares the number of constraints (measurement). Thus, an unpruned value 
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function for T = 20 is defined over 10547,864 linear functions; at T = 30 we 
have 10561012337 linear constraints. The pruned value function, in compari- 
son, contains only 13 such constraints. 

This enormous explosion of linear pieces is a key reason why plain 
POMDPs are impractical. Figure 15.6 compares side-by-side the steps that 
led to the value function V2. The left column shows our pruned functions, 
whereas the right row maintains all linear functions without pruning. While 
we only have a single measurement update in this calculation, the number 
of unused functions is already enormous. We will return to this point later, 
when we will devise efficient approximate POMDP algorithms. 

A final observation of our analysis is that the optimal value function for 
any finite horizon is continuous, piecewise linear, and convex. Each linear 
piece corresponds to a different action choice at some point in the future. 
The convexity of the value function indicates the rather intuitive observation, 
namely that knowing is always superior to not knowing. Given two belief 
states b and b’, the mixed value of the belief states is larger or equal to the 
value of the mixed belief state, for some mixing parameter p with 0 < 8 < 1: 


8V(b--(—-8)V(V) > V(8b-(1— B)U) 


This characterization only applies to the finite horizon case. Under infinite 
horizon, the value function can be discontinuous and nonlinear. 


The Finite World POMDP Algorithm 


The previous section showed, by example, how to calculate value functions 
in finite worlds. Here we briefly discuss a general algorithm for calculating 
a value function, before deriving it from first principles. 

The algorithm POMDP is listed in Table 15.1. This algorithm accepts as 
an input just a single parameter: T, the planning horizon for the POMDP. It 
returns a set of parameter vectors, each of the form 


(v1,..., UN) 


Each of these parameters specifies a linear function over the belief space of 
the form 


Sou Di 
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Figure 15.6 Comparison of an exact pruning algorithm (left row) versus a non- 
pruning POMDP algorithm (right row), for the first few steps of the POMDP planning 
algorithm. Obviously, the number of linear constraints increases dramatically with- 
out pruning. At T' — 20, the unpruned value function is defined over 10547864 linear 
functions, whereas the pruned one only uses 13 such functions. 
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1: Algorithm POMDP(T): 

2: Y = (0;0,...,0) 

3: fort =1toT do 

4: Y'-0 

5: for all (u;vf,...,vÀ) in Y do 

6: for all control actions u do 

7: for all measurements z do 

8: for j = 1 to N do 

N 
9: 人 = 5 v? p(z | vi) plx; | u,v;) 
i=1 

10: endfor 

11: endfor 

12: endfor 

13: endfor 

14: for all control actions u do 

15: for all k(1),..., kK(M) = (1,...,1) to (JT|,...,|T]) do 
16: fori = 1 to N do 

17: vy =y henu + y» Ecl 

18: endfor 

19: add (u;vj,..., v) to T" 

20: endfor 

21: endfor 

22: optional: prune Y' 

23: Y-Y 

24: endfor 

25: return 工 








Table15.1 The POMDP algorithm for discrete worlds. This algorithm represents the 
optimal value function by a set of linear constraints, which are calculated recursively. 
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1: Algorithm policy POMDP(Y, b = (p1,...,pn)): 
N 
2: & — argmax 5 vf Pi 
(uyvk,...,uk ET i=l 
3 return u 








Table 15.2 The algorithm for determining the optimal action for a policy repre- 
sented by the set of linear functions YT. 


The actual value is governed by the maximum of all these linear functions: 


5 Ui Pi 
a 

The algorithm POMDP computes this value function recursively. An initial 
set for the pseudo-horizon T' = 0 is set in line 2 of Table 15.1. The algo- 
rithm POMDP then recursively computes a new set in the nested loop of 
lines 3-24. A key computational step takes place in line 9: Here, the coeffi- 
cients TER j Of the linear functions needed to compute the next set of linear 
constraints are computed. Each linear function results from executing con- 
trol u, followed by observing measurement z, and then executing control u’. 
The linear constraint corresponding to w was calculated in the previous iter- 
ation for a smaller planning horizon (taken in line 5). Thus, upon reaching 
line 14, the algorithm has generated one linear function for each combination 
of control action, measurement, and linear constraint of the previous value 
function. 

The linear constraints of the new value function result by taking the expec- 
tations over measurements, as done in lines 14-21. For each control action, 
the algorithm generates K ™ such linear constraints in line 15. This large 
number is due to the fact that each expectation is taken over the M pos- 
sible measurements, each of which can be “combined” with any of the K 
constraints contained in the previous value function. Line 17 computes the 
expectation for each such combination. The resulting constraint is added to 
the new set of constraints in line 19. 

The algorithm for finding the optimal control action is shown in Ta- 
ble 152. The input to this algorithm is a belief state, parameterized by 
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b = (pi,..., pu), along with the set of linear functions Y. The optimal ac- 
tion is determined by search through all linear functions, and identifying the 
one that maximizes its value for b. This value is returned in line 3 of the 
algorithm policy POMDP:, Table 15.2. 


Mathematical Derivation of POMDPs 


Value Iteration in Belief Space 


The general update for the value function implements (15.2), restated here 
for convenience. 


Vr(b) = y max K u) + f vie p(b’ | u,b) db’ 


We begin by transforming this equation into a more practical form, one that 
avoids integration over the space of all possible beliefs. 

A key factor in this update is the conditional probability p(b' | u,b). This 
probability specifies a distribution over probability distributions. Given a 
belief b and a control action u, the outcome is indeed a distribution over dis- 
tributions. This is because the concrete belief b’ is also based on the next 
measurement, the measurement itself is generated stochastically. Dealing 
with distributions of distributions adds an element of complexity that is un- 
desirable. 

If we fix the measurement, the posterior b’ is unique and p( | u, b) degen- 
erate to a point-mass distribution. Why is this so? The answer is provided by 
the Bayes filter. From the belief b before action execution, the action u, and 
the subsequent observation z, the Bayes filter calculates a single, posterior 
belief b’ which is the single, correct belief. Thus, we conclude that if only we 
knew z, the integration over all beliefs in (15.38) would be obsolete. 

This insight can be exploited by re-expressing 


usb) = f» | u,b,2) ple | u,b) de 
where p(b’ | u,b, z) is a point-mass distribution focused on the single belief 


calculated by the Bayes filter. Plugging this integral into Equation (15.38) 
gives us 


Vr() = y max [red ? | i Vect (6,2) ar S Tas) a: 


5rjs.cn 000000 


532 


(15.41) 


(15.42) 


(15.43) 
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The inner integral 


/ Vra (b) p(b! | u,b, 2) di 


contains only one non-zero term. This is the term where 0’ is the distribution 
calculated from b, u, and z using the Bayes filter. Let us call this distribution 
B(b, u, z): 





B(b,u,z)(x') = p(x | z,u,b) 
A p(z | a’, u,b) p(z’ | u,b) 
p(z | u,b) 


= ey rela) J Pæ |b.) ple ub) dz 


= -gy el #) f vt | a2) bla) de 

The reader should recognize the familiar Bayes filter derivation that was ex- 

tensively discussed in Chapter 2, this time with the normalizer made explicit. 
We can now rewrite (15.40) as follows. Note that this expression no longer 

integrates over b’. 


Vp(b = w max [ro u) + [ veio) p(z | u,b) dz 





This form is more convenient than the original one in (15.38), since it only 
requires integration over all possible measurements z, instead of all possible 
belief distributions b’. This transformation was used implicitly in the ex- 
ample above, where a new value function was obtained by mixing together 
finitely many piecewise linear functions. 

Below, it will be convenient to split the maximization over actions from the 
integration. Hence, we notice that (15.43) can be rewritten as the following 
two equations: 


Vr(b,u) = wv K u) + j Vr 4A (B(b, u, z)) p(z | u,b) dz 
Vr(b) = maxVr(b, u) 
Here Vr (b, u) is the horizon T-value function over the belief b, assuming that 
the immediate next action is u. 
Value Function Representation 


As in our example, we represent the value function by a maximum of a set 
of linear functions. We already discussed that any linear function over the 
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belief simplex can be represented by the set of coefficients v1,...,un: 
N 
i=l 
where, as usual, pi, ...,pw are the parameters of the belief distribution b. As 


in our example, a piecewise linear and convex value function Vr (b) can be 
represented by the maximum of a finite set of linear functions 


za k y. 
V(b) = 225 Di 


where vf,...,vi denotes the parameters of the k-th linear function. The 
reader should quickly convince herself that the maximum of a finite set of 
linear functions is indeed a convex, continuous, and piecewise linear func- 
tion. 


Calculating the Value Function 


We will now derive a recursive equation for calculating the value function 
Vr (b). We assume by induction that Vr. ; (b), the value function for horizon 
T — 1, is represented by a piecewise linear function as specified above. As 
part of the derivation, we will show that under the assumption that Vr. (b) 
is piecewise linear and convex, Vr (b) is also piecewise linear and convex. 
Induction over the planning horizon T' then proves that all value functions 
with finite horizon are indeed piecewise linear and convex. 

We begin with Equations (15.44) and (15.45). If the measurement space is 
finite, we can replace the integration over z by a finite sum. 


Vr(b, u) y |r(b,u) PA est 1(B(b, u, z)) p(z | u,b) 
Vr(b) = max Vr(b, u) 


The belief B(b, u, z) is obtained using the following expression, derived from 
Equation (15.42) by i the integral with a finite sum. 


B(b,u,z)(u) = XD plz |x’) Dv (a! | u, x) d(x) 


If the belief b is represented by the parameters [pi,..., pw), and the belief 
B(b,u, z) by (pi,.... py }, it follows that the j-th parameter of the belief b’ is 
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computed as follows: 


N 


ple | 25) XO ples | usas) pi 


i=1 


1 


/ pec 
Pj D(z | u,b) 


To compute the value function update (15.48), let us now find more practi- 
cal expressions for the term Vr_1(B(b, u, z)), using the finite sums described 
above. Our derivation starts with the definition of Vr. 4 and substitutes the 
p; according to Equation (15.51): 





N 
Vr aA(B(b, u, z)) = ina yu pj 
j=l 
N 1 N 
= k 
加 us me p(z | u, b) plz | vj) 2. D(x; | u, £i) Pi 
1 x N 
= sGTu B MEX 255 Peles) DY ples | wsi) pi 
i à j=l j=l 
**) 
1 N N 
~ plæl|u,b) 2 2: di 2- vj p(z | 25) p(x; | u, xi) 





The term marked (*) is independent of the belief. Hence, the function labeled 
(xx) is a linear function in the parameters of the belief space, p, ..., pw. The 
term 1/p(z | u, b) is both nonlinear and difficult to compute, since it contains 
an entire belief b as conditioning variable. However, the beauty of POMDPs 
is that this expression cancels out. In particular, substituting this expression 
back into (15.48) yields the following update equation: 





N N 
Vrou) = y [rO € >> max D> pi vf ple | 25) play | usas) 
z i=1 — j=l 


Hence, despite the non-linearity arising from the measurement update, 
Vr (b, u) is once again piecewise linear. 
Finally, we note that r(b, u) is given by the expectation 


N 
r(bu) = E,r(x,u) = > Pi r(zj,u) 
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Here we assumed that the belief b is represented by the parameters 


in. Pate Dn}. 
The desired value function Vr is now obtained by maximizing Vr(b, u) 
over all actions u, as stated in (15.49): 


Vr(b) = max Vr(b,u) 
N 
= 7 max (> r(zj,u) *2 max 
N N 
5 pi» vË p(z | £3) p(x; | u, xi) ) 


=: vk , 
uz, 


N 
k 
十 > ms > Di Uuzá 
Zz i=l 


(*) 








N 
— fy max p» r(zi,u) 
i=l 








with 


N 
Uk zi = 5 vj p(z | £j) p(x, | uui) 
j=l 


as indicated. This expression is not yet in the form of a maximum of lin- 
ear functions. In particular, we now need to change the sum-max-sum ex- 
pression labeled (*) in (15.55) into a max-sum-sum expression, which is the 
familiar form of a maximum over a set of linear functions. 

We utilize the same transformation as in our example, Chapter 15.2.3. 
Specifically, suppose we would like to compute the maximum 


max{ay(Z),...,@n(a)} + max(bi(z),..., b. (z)) 
for some functions a4(z),...,a4(z) and b1(z),...,b,(x) over a variable zx. 


This maximum is attained at 


ing ton [a; (x) + 55 (z)] 


This follows from the fact that each a; 4- b; is indeed a lower bound. Further 
for any x there must exist an i and j such that a;(x)+b; (x) defines the maxi- 
mum. By including all such potential pairs in (15.58) we obtain a tight lower 
bound, i.e., the solution. 
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This is now easily generalized into arbitrary sums over max expressions: 
= N N N m 
A O c e A 
We apply now this “trick” to our POMDP value function calculation and 
obtain for the expression (*) in (15.55). Let M be the total number of mea- 


surements. 


N 
max Pi vU ; = max max = max 24 
m ko feet k(1) k(2) k(M) i Vu, 
= 


zZ 


-pn Ae D a 
Here each k( ) is a separate variable, each of uius takes on the values of the 
variable k on the left hand side. There are as many such variables as there 
are measurements. As a result, the desired value function is now obtained as 
follows: 


Vr(b) = y max yom r(zi,u 


k(z) 
n a 
i Far kat) & nD ees 





la eed mag max Do L Ti, U ee m 


In other words, each combination 


EE 


makes for a new linear constraint in iba value function Vr. 

There will be one such constraint for each unique joint setting of the vari- 
ables k(1), k(2), ..., k(M). Obviously, the maximum of these linear functions 
is once again piecewise linear and convex, which proves that this represen- 
tation indeed is sufficient to represent the correct value function over the 
underlying continuous belief space. Further, the number of linear pieces will 
be doubly exponential in the size of the measurement space, at least for our 
naive implementation that retains all such constraints. 


Practical Considerations 


The value iteration algorithm discussed thus far is far from practical. For 
any reasonable number of distinct states, measurements, and controls, the 
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(a) pruned value function V3 (6) (b) PBVI value function V39 (0) 
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Figure 15.7 The benefit of point-based value iteration over general value iteration: 
Shown in (a) is the exact value function at horizon T' — 30 for a different example, 
which consists of 120 constraints, after pruning. On the right is the result of the PBVI 
algorithm retaining only 11 linear functions. Both functions yield virtually indistin- 
guishable results when applied to control. 


complexity of the value function is prohibitive, even for relatively beginning 
planning horizons. 

There exists a number of opportunities to implement more efficient algo- 
rithms. One was already discussed in our example: The number of linear 
constraints rapidly grows prohibitively large. Fortunately, a good number of 
linear constraints can safely be ignored, since they do not participate in the 
definition of the maximum. 

Another related shortcoming of the value iteration algorithm is that it com- 
putes value functions for all belief states, not just for the relevant ones. When 
a robot starts at a well-defined belief state, the set of reachable belief states 
is often much smaller. For example, if the robot seeks to move through two 
doors for which it is uncertain as to whether they are open or closed, it surely 
knows the state of the first when reaching the second. Thus, a belief state 
in which the second door's state is known but the first one is not is physi- 
cally unattainable. In many domains, huge subspaces of the belief space are 
unattainable. 

Even for the attainable beliefs, some might only be attained with small 
probability; others may be plainly undesirable so that the robot will generally 
avoid them. Value iteration makes no such distinctions. In fact, the time and 
resources invested in value computations are independent of the odds that a 
belief state will actually be relevant. 
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Figure 15.8 Indoor environment, in which we seek a control policy for finding a 
moving intruder. (a) Occupancy grid map, and (b) discrete state set used by the 
POMDP The robot tracks its own pose sufficiently well that the pose uncertainty 
can be ignored. The remaining uncertainty pertains to the location of the person. 
Courtesy of Joelle Pineau, McGill University. 


There exists a flurry of algorithms that are more selective with regards to 
the subspace of the belief state for which a value function is computed. One 
of them is point-based value iteration, or PBVI. It is based on the idea of main- 
taining a set of exemplary belief states, and restricting the value function to 
constraints that maximize the value function for at least one of these belief 
states. More specifically, imagine we are given a set B = {b1, b2, . . .} of belief 
states, called belief points. Then the reduced value function V with respect to 
B is the set of constraints v € V for which we can find at least one b; € B 
such that v(b;) = V(b;). In other words, linear segments that do not coin- 
cide with any of the discrete belief points in B are discarded. The original 
PBVI algorithm calculates the value function efficiently by not even gener- 
ating constraints that are not supported by any of the points; however, the 
same idea can also be implemented by pruning away all line segments after 
generating them in a standard POMDP backup. 
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(a) 


(b) 


(c) 


(d) 
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Figure 15.9 A successful search policy. Here the tracking of the intruder is imple- 
mented via a particle filter, which is then projected into a histogram representation 
suitable for the POMDP. The robot first clears the room on the top, then proceeds 
down the hallway. Courtesy of Joelle Pineau, McGill University. 
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The idea of maintaining a belief point set B can make value iteration sig- 
nificantly more efficient. Figure 15.7a shows the value function for a prob- 
lem that differs from our example in Chapter 15.2 by only one aspect: The 
state transition function is deterministic (simply replace 0.8 by 1.0 in (15.7) 
and (15.8)). The value function in Figure 15.7a is optimal with respect to 
the horizon T = 30. Careful pruning along the way reduced it to 120 
constraints, instead of the 10561012337 that a non-pruning implementation 
would give us—assuming the necessary patience. With a simple point set 
B = {pı = 0.0,p1 = 0.1, p1 = 0.2,...,p1 = 1}, we obtain the value function 
shown on the right side of Figure 15.7b. This value function is approximate, 
and it consists of only 11 linear functions. More importantly, its calculation 
is more than 1,000 times faster. 

The use of belief points has a second important implication: The prob- 
lem solver can select belief points deemed relevant for the planning pro- 
cess. There exists a number of heuristics to determine a set of belief points. 
Chief among them are to identify reachable beliefs (e.g., through simulating 
the robot in the POMDP), and to find beliefs that are spaced reasonably far 
apart from each other. By doing so it is usually possible to get many or- 
ders of magnitude faster POMDP algorithms. In fact, it is possible to grow 
the set B incrementally, and to therefore build up the set of value functions 
Vi, v5,..., Vr incrementally, by adding new linear constraints to all of them 
whenever a new belief point is added. In this way, the planning algorithm 
becomes anytime, in that it produces increasingly better results as time goes 
on. 

An emerging view in robotics is that the number of plausible belief states 
exceeds that of the number of states only by a constant factor. As a conse- 
quence, techniques that actively select appropriate regions in belief space for 
updating during planning have fundamentally different scaling properties 
than the flat, unselective value iteration approach. 

A typical robotics result of PBVI is shown in Figures 15.8 and 15.9. Fig- 
ure 15.8a depicts an occupancy grid map of an indoor environment that con- 
sists of a long corridor and a room. The robot starts on the right side of the 
diagram. Its task is to find an intruder that moves according to Brownian 
motion. To make this task amenable to PBVI planning, a low-dimensional 
state space is required. The state space used here is shown in Figure 15.8b. It 
tessellates the grid map into 22 discrete regions. The granularity of this rep- 
resentation is sufficient to solve this task, while it makes computing the PBVI 
value function computationally feasible. The task of finding such an intruder 
is inherently probabilistic. Any control policy has to be aware of the uncer- 
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tainty in the environment, and seek its reduction. Further, it is inherently 
dynamic. Just moving to spaces not covered yet is generally insufficient. 
Figure 15.9 shows a typical result of POMDP planning. Here the robot has 
determined a control sequence that first explores the relatively small room, 
then progresses down the corridor. This control policy exploits the fact that 
while the robot clears the room, the intruder has insufficient time to pass 
through the corridor. Hence, this policy succeeds with high probability. 

This example is paradigmatic of applying POMDP value iteration to ac- 
tual robot control problem. Even when using aggressive pruning as in PBVI, 
the resulting value functions are still limited to a few dozen states. How- 
ever, if such a low-dimensional state representation can be found, POMDP 
techniques yield excellent results through accommodating the inherent un- 
certainty in robotics. 


Summary 


In this section, we introduced the basic value iteration algorithm for robot 
control under uncertainty. 


e POMDPs are characterized by multiple types of uncertainty: Uncertainty 
in the control effects, uncertainty in perception, and uncertainty with re- 
gards to the environment dynamics. However, POMDPs assume that we 
are given a probabilistic model of action and perception. 


* The value function in POMDPs is defined over the space of all beliefs 
robots might have about the state of the world. For worlds with N states, 
this belief is defined over the (N — 1)-dimensional belief simplex, charac- 
terized by the probability assigned to each of the N states. 


* For finite horizons, the value function is piecewise linear in the belief 
space parameters. It is also continuous and convex. Thus, it can be rep- 
resented as a maximum of a collection of finitely many linear functions. 
Further, these linear constraints are easily calculated. 


。 The POMDP planning algorithm computes a sequence of value func- 
tions, for increasing planning horizons. Each such calculation is recursive: 
Given the optimal value function at horizon T — 1, the algorithm proceeds 
to computing the optimal value function at horizon T. 


* Each recursive iteration combines a number of elements: The action 
choice is implemented by maximizing over sets of linear constraints, 
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where each action carries its own set. The anticipated measurement is 
incorporated by combining sets of linear constraints, one for each mea- 
surement. The prediction is then implemented by linearly manipulating 
the set of linear constraints. Payoff is generalized into the belief space by 
calculating its expectation, which once again is linear in the belief space 
parameters. The result is a value backup routine that manipulates linear 
constraints. 


* We find that the basic update produces intractably many linear con- 
straints. Specifically, in each individual backup the measurement step 
increases the number of constraints by a factor that is exponential in the 
number of possible measurements. Most of these constraints are usually 
passive, and omitting them does not change the value function at all. 


* Point-based value iteration (PBVI) is an approximate algorithm that main- 
tains only constraints that are needed to support a finite set of representa- 
tive belief states. In doing so, the number of constraints remains constant 
instead of growing doubly exponentially (in the worst case). Empirically 
PBVI provides good results when points are chosen to be representative 
and well-separated in the belief space. 


In many ways, the material presented in this chapter is of theoretical interest. 
The value iteration algorithm defines the basic update mechanism that un- 
derlies a great number of efficient decision making algorithms. However, it 
in itself is not computationally tractable. Efficient implementations therefore 
resort to approximations, such as the PBVI technique we just discussed. 
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(1995) led to significant improvements over previous algorithms. Paired with remarkable in- 
crease of computer speed and memory available, their work enabled POMDPs to grow into a 
tool for solving small AI problems. Hauskrecht (1997) provided bounds on the complexity of 
POMDP problem solving. 
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The most significant wave of progress came with the advent of approximate techniques— 
some of which will be discussed in the next chapter. An improved grid approximation of 
POMDP belief spaces was devised by Hauskrecht (2000); variable resolution grids were intro- 
duced by Brafman (1997). Reachability analysis began to play a role in computing policies. Poon 
(2001) and Zhang and Zhang (2001) developed point-based POMDP techniques, in which the 
set of belief states were limited. Unlike Hauskrecht’s (2000) work, these techniques relied on 
piecewise linear functions for representing value functions. This work culminated in the defini- 
tion of the point based value iteration algorithm by Pineau et al. (2003b), who developed new 
any-time techniques for finding relevant belief space for solving POMDPs. Their work was later 
extended using tree-based representations (Pineau et al. 2003a). 

Geffner and Bonet (1998) solved a number of challenge problems using dynamic program- 
ming applied to a discrete version of the belief space. This work was extended by Likhachev 
et al. (2004), who applied the A* algorithm (Nilsson 1982) to a restricted type of POMDP. Fergu- 
son et al. (2004) extended this to D* planning for dynamic environments (Stentz 1995). 

Another family of techniques used particles to compute policies, paired with nearest neigh- 
bor in particle set space to define approximations to the value function (Thrun 2000a). Particles 
were also used for POMDP monitoring by Poupart et al. (2001). Poupart and Boutilier (2000) 
devised an algorithm for approximating the value function using a technique sensitive to the 
value itself, which led to state-of-the-art results. A technique by Dearden and Boutilier (1994) 
gained efficiency through interleaving planning and execution of partial policies; see Smith and 
Simmons (2004) for additional research on interleaving heuristic search-type planning and exe- 
cution. Exploiting domain knowledge was discussed in Pineau et al. (2003c), and Washington 
(1997) provided incremental techniques with bounds. Additional work on approximate POMDP 
solving is discussed in Aberdeen (2002); Murphy (2000b). One of the few fielded systems con- 
trolled by POMDP value iteration is the CMU Nursebot, whose high-level controller and dialog 
manager is a POMDP (Pineau et al. 2003d; Roy et al. 2000). 

An alternative approach to finding POMDP control policies is to search directly in the space 
of policies, without computing a value function. This idea goes back to Williams (1992), who 
developed the idea of policy gradient search in the context of MDPs. Contemporary techniques 
for policy gradient search is described in Baxter et al. (2001) and Ng and Jordan (2000). Bagnell 
and Schneider (2001) and Ng et al. (2003) successfully applied this approach to the control of 
hovering an autonomous helicopter; in fact, Ng et al. (2003) reports that it took only 11 days 
to design such a controller using POMDP techniques, using a learned model. In more recent 
work, Ng et al. (2004) used these techniques to identify a controller capable of sustained in- 
verted helicopter flight, a previously open problem. Roy and Thrun (2002) applied policy search 
techniques to mobile robot navigation, and discuss the combination of policy search and value 
iteration techniques. 

Relatively little progress has been made on learning POMDP models. Early attempts to learn 
the model of a POMDP from interaction with an environment essentially failed (Lin and Mitchell 
1992; Chrisman 1992), due to the hardness of the problem. Some more recent work on learning 
hierarchical models shows more promise (Theocharous et al. 2001). Recent work has moved 
away from learning HMM-style models, into alternative representations. Techniques for repre- 
senting and learning the structure of partially observable stochastic environments can be found 
in Jaeger (2000); Littman et al. (2001); James and Singh. (2004); Rosencrantz et al. (2004). While 
none of these papers fully solve the POMDP problem, they nevertheless are intellectually rele- 
vant and promise new insights into the largely open problem of probabilistic robot control. 
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TIGER PROBLEM 


15 Partially Observable Markov Decision Processes 


Exercises 


1. This problem is known as the tiger problem and is due to Cassandra, 
Littman and Kaelbling (Cassandra et al. 1994). A person faces two doors. 
Behind one is a tiger, behind the other a reward of +10. The person can 
either listen or open one of the doors. When opening the door with a 
tiger, the person will be eaten, which has an associated cost of —20. Lis- 
tening costs —1. When listening, the person will hear a roaring noise that 
indicates the presence of the tiger, but only with 0.85 probability will the 
person be able to localize the noise correctly. With 0.15 probability, the 
noise will appear as if it came from the door hiding the reward. 


Your questions: 


(a) Provide the formal model of the POMDP, in which you define the state, 
action, and measurement spaces, the cost function, and the associated 
probability functions. 


(b) What is the expected cumulative payoff/cost of the open-loop action 
sequence: “Listen, listen, open door 1"? Explain your calculation. 


(c) What is the expected cumulative payoff/cost of the open-loop action 
sequence: "Listen, then open the door for which we did not hear a 
noise"? Again, explain your calculation. 


(d 


— 


Manually perform the one-step backup operation of the POMDP. Plot 
the resulting linear functions in a diagram just like the ones in Chap- 
ter 15.2. Provide diagrams of all intermediate steps, and don't forget to 
add units to your diagrams. 


— 


(e) Manually perform the second backup, and provide all diagrams and 


explanations. 


(f 


Wa 


Implement the problem, and compute the solution for the planning 
horizons T = 1,2,...,8. Make sure you prune the space of all linear 
functions. For what sequences of measurements would a person still 
choose to listen, even after 8 consecutive listening actions? 


2. Show the correctness of Equation (15.26). 


3. What is the worst-case computational complexity of a single POMDP 
value function backup? Provide your answer using the O( ) notation, 
where arguments may include the number of linear functions before a 
backup, and the number of states, actions, and measurements in a discrete 
POMDP. 
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4. The POMDP literature often introduces a discount factor, which is analo- 
gous to the discount factor discussed in the previous section. Show that 
even with a discount factor, the resulting value functions are still piece- 
wise linear. 


5. Consider POMDP problems with finite state, action, and measurement 
space, but for which the horizon T 1 oo. 


(a) Will the value function still be piecewise linear? 
(b) Will the value function still be continuous? 


(c) Will the value function still be convex? 


For all three questions, argue why the answer is positive, or provide a 
counterexample in case it is negative. 


6. On page 28, we provided an example of a robot sensing and opening a 
door. In this exercise, you are asked to implement a POMDP algorithm for 
an optimal control policy. Most information can be found in the example 
on page 28. To turn this into a control task, let us assume that the robot 
has a third action: go. When it goes, it receives 十 10 payoff if the door is 
open, and —100 if it is closed. The action go terminates the episode. The 
action do nothing costs the robot —1, and push costs the robot —5. Plot 
value functions for different time horizons up to T' = 10, and explain the 
optimal policy. 
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Approximate POMDP Techniques 


Motivation 


In previous chapters, we have studied two main frameworks for action se- 
lection under uncertainty: MDPs and POMDPs. Both frameworks address 
non-deterministic action outcomes, but they differ in their ability to accom- 
modate sensor limitations. Only the POMDP algorithms can cope with un- 
certainty in perception, whereas MDP algorithms assume that the state is 
fully observable. However, the computational expense of exact planning 
in POMDPs renders exact methods inapplicable to just about any practical 
problem in robotics. 

This chapter describes POMDP algorithms that scale. As we shall see in 
this chapter, both MDP and POMDP are extreme ends of a spectrum of pos- 
sible probabilistic planning and control algorithms. This chapter reviews a 
number of approximate POMDP techniques that fall in between MDPs and 
POMDPs. The algorithms discussed here share with POMDPs the use of 
value iteration in belief space. However, they approximate the value func- 
tion in a number of ways. By doing so, they gain immense speed-ups over 
the full POMDP solution. 

The techniques surveyed in this chapter have been chosen because they 
characterize different styles of approximation. In particular, we will discuss 
the following three algorithms: 


* QMDP is a hybrid between MDPs and POMDPs. This algorithm general- 
izes the MDP-optimal value function defined over states, into a POMDP- 
style value function over beliefs. QMDP would be exact under the— 
usually false—assumption that after one step of control, the state becomes 
fully observable. Value iteration in QMDPs is of the same complexity as 
in MDPs. 
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1: Algorithm QMDP() = (p1,...,pn)): 
2: V= MDP discrete value iteration() // see page 502 
3: for all control actions u do 
N A 
4: Q(zi,u) = r(zi, u) + p V(j) p(x; | u, xi) 
j=l 
5: endfor 
N 
6: return argmax p» pi Q(zi, u) 
u i=l 








Table 16.1 The OMDP algorithm computes the expected return for each control ac- 
tion u, and then selects the action u that yields the highest value. The value function 
used here is MDP-optimal, hence dismisses the state uncertainty in the POMDP. 


* The augmented MDP, or AMDP. This algorithm projects the belief state 
into a low-dimensional sufficient statistic and performs value iteration in 
this lower-dimensional space. The most basic implementation involves a 
representation that combines the most likely state and the degree of un- 
certainty, measured by entropy. The planning is therefore only marginally 
less efficient than planning in MDPs, but the result can be quite an im- 
provement! 


e Monte Carlo POMDP, or MC-POMDP. This is the particle filter version of 
the POMDP algorithm, where beliefs are approximated using particles. 
By constructing a belief point set dynamically—just like the PBVI algo- 
rithm described towards the end of the previous chapter—MC-POMDPs 
can maintain a relatively small set of beliefs. MC-POMDPs are applicable 
to continuous-valued states, actions, and measurements, but they are sub- 
ject to the same approximations that we have encountered in all particle 
filter applications in this book, plus some additional ones that are unique 
to MC-POMDPs. 


These algorithms cover some of the primary techniques for approximating 
value functions in the emerging literature on probabilistic planning and con- 
trol. 


»js.on 000000 





16.2 


(16.1) 


(16.2) 


(16.3) 


162 QMDPs 549 


QMDPs 


QMDPs are an attempt to combine the best of MDPs and POMDPs. Value 
functions are easier to compute for MDPs than for POMDPs, but MDPs rely 
on the assumption that the state is fully observable. A OMDP is computa- 
tionally just about as efficient as an MDP, but it returns a policy that is defined 
over the belief state. 

The mathematical "trick" is relatively straightforward. The MDP algo- 
rithm discussed in Chapter 14 provides us with a state-based value function 
that is optimal under the assumption that the state is fully observable. The 
resulting value function V is defined over world states. The QMDP general- 
izes this value to the belief space through the mathematical expectation: 


N 
Vib) = Ef) = > Pi V (zi) 


Here we use our familiar notation p; — b(r;). Thus, this value function is 
linear, with the parameters 


This linear function is exactly of the form used by the POMDP value iteration 


algorithm. Hence the value function over the belief space is given by the 
following linear equation: 


: N 
V(b) = 5 Pi Ui 
i=1 


The MDP value function provides a single linear constraint in belief space. 
This enables us to apply the algorithm policy_POMDP in Table 15.2, with a 
single linear constraint. 

The most basic version of this idea leads to the algorithm QMDP shown 
in Table 16.1. Here we use a slightly different notation than in Table 15.2: 
Instead of caching away one linear function for each action u and letting 
policy_POMDP determine the action, our formulation of QMDP directly 
computes the optimal value function through a function Q. The value of 
Q(z, u), as calculated in line 4 in Table 16.1, is the MDP-value of the control 
u in state z;. The generalization to belief states then follows in line 6, where 
the expectation is taken over the belief state. Line 6 also maximizes over all 
actions, and returns the control action with the highest expected value. 

The insight that the MDP-optimal value function can be generalized to 
belief space enables us to arbitrarily combine MDP and POMDP backups. 
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(16.4) 
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In particular, the MDP-optimal value function V can be used as input to the 
POMDP algorithm in Table 15.1 (page 529). With T further POMDP backups, 
the resulting policy can actively engage in information gathering—as long 
as the information shows utility within the next T time steps. Even for very 
small values of T, we usually obtain a robust probabilistic control algorithm 
that is computationally vastly superior to the full POMDP solution. 


Augmented Markov Decision Processes 


The Augmented State Space 


The augmented MDP, or AMDP, is an alternative to the QMDP algorithm. It 
too approximates the full POMDP value function. However, instead of ignor- 
ing state uncertainty beyond a small time horizon T, the AMDP compresses 
the belief state into a more compact representation, and then performs full 
POMDP-style probabilistic backups. 

The fundamental assumption in AMDPs is that the belief space can be 
summarized by a lower-dimensional “sufficient” statistic f, which maps be- 
lief distributions into a lower dimensional space. Values and actions are 
calculated from this statistic f(b) instead of the original belief b. The more 
compact the statistic, the more efficient the resulting value iteration algo- 
rithm. 

In many situations a good choice of the statistic is the tuple 


| argmax b(x) 
Hy (2x) 


Here argmax,, b(x) is the most likely state under the belief distribution b, and 


ol 


Hx) = -f b(x) logb(x) dx 


is the entropy of the belief. This space will be called the augmented state space, 
since it augments the state space by a single value, the entropy of the belief 
distribution. Calculating a value function over the augmented state space, 
instead of the belief space, makes for a huge change in complexity. The 
augmented state avoids the high dimensionality of the belief space, which 
leads to enormous savings when computing a value function (from worst 
case doubly exponential to low-degree polynomial). 

The augmented state representation is mathematically justified if f(b) is a 
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sufficient statistic of b with regards to the estimation of value: 
Vb) = V(f(b) 


for all beliefs b the robot may encounter. In practice, this assumption will 
rarely hold true. However, the resulting value function might still be good 
enough for a sensible choice of control. 

Alternatively, one might consider different statistics, such as the moments 
of the belief distribution (mean, variance, ...), the eigenvalues and vectors of 
the covariance, and so on. 


The AMDP Algorithm 


The AMDP algorithm performs value iteration in the augmented state space. 
To do so, we have to overcome two obstacles. First, the exact value up- 
date is non-linear for our augmented state representation. This is because 
the entropy is a non-linear function of the belief parameters. It therefore 
becomes necessary to approximate the value backup. AMDPs discretize 
the augmented state, representing the value function V by a look-up table. 
We already encountered such an approximation when discussing MDPs. In 
AMDPs, this table is one dimension larger than the state space table used by 
MDPs. 

The second obstacle pertains to the transition probabilities and the payoff 
function in the augmented state space. We are normally given probabilities 
such as the motion model p(x’ | u, x), the measurement model p(z | x), and 
the payoff function r(z,u). But for value iteration in the augmented state 
space we need to define similar functions over the augmented state space. 

AMDPs use a "trick" for constructing the necessary functions. The trick is 
to learn transition probabilities and payoffs from simulations. The learning 
algorithm is based on a frequency statistic, which counts how often an aug- 
mented belief b transitions to another belief b under a control u, and what 
average payoff this transition induces. 

Table 16.2 states the basic algorithm AMDP value iteration. The algo- 
rithm breaks down into two phases. In a first phase (lines 2-19), it constructs 
a transition probability table P for the transition from an augmented state b 
and a control action u to a possible subsequent augmented state b'. It also 
constructs a payoff function & which measures the expected immediate pay- 
off r when u is chosen in the augmented state b. 

These functions are estimated through a sampling procedure, in which we 
generate n samples for each combination of b and u (line 8). For each of 
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27: return V, 万 ,及 


1: Algorithm AMDP_value_iteration(): 
2: for all b do / / learn model 
3: for all u do 
4: for all b do / / initialize model 
5: P(b, u,b’) = 0 
6: endfor 
7: R(b,u) =0 
8: repeat n times // learn model 
9: generate b with f(b) = b 
10: sample x ~ b(x) // belief sampling 
11: sample z' ~ p(z' | u, x) // motion model 
12: sample z ~ p(z | a) // measurement model 
13: calculate b = B(b, u, z) // Bayes filter 
14: calculate b = f(b’) // belief state statistic 
15: P(b,u,U) = P(b,u, 6’) ++ //learn transitions prob’s 
16: R(b,u) = R(b,u) + 2 // learn payoff model 
17: endrepeat 
18: endfor 
19: endfor 
20: for all b / / initialize value function 
21: Y (b) = rain 
22: endfor 
23: repeat until convergence // value iteration 
24: for all b do 
25: V(b) =y max 区 b) 十 5 V (V) P(b, u, | 
v 
26: endfor 


/ / return value fct & model 








1: Algorithm policy AMDP(V, P, R, b): 
25 b= f(b) 





3: return argmax 区 b) + Y V (V) P(b, u, n) 
u B 





Table 16.2 Top: The value iteration algorithm for augmented MDPs. Bottom: The 


algorithm for selecting a control action. 
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these Monte Carlo simulations, the algorithm first generates a belief b for 
which f(b) — b. This step is tricky (in fact, it is ill-defined): In the original 
AMDP model, the creators simply choose to set b to a symmetric Gaussian 
with parameters chosen to match b. Next, the AMDP algorithm samples a 
pose z, a successor pose z', and a measurement z, all in the obvious ways. It 
then applies the Bayes filter to generate a posterior belief B(b, u, z), for which 
it calculates the augmented statistics (line 14). The tables P and are then 
updated in lines 15 and 16, using simple frequency counts weighted (in the 
case of the payoff) with the actual payoff for this Monte Carlo sample. 

Once the learning is completed, AMDP continues with value iteration. 
This is implemented in lines 20-26. As usual, the value function is initial- 
ized by a large negative value. Iteration of the backup equation in line 25 
leads to a value function defined over the augmented state space. 

When using AMDPs, the state tracking usually takes place over the orig- 
inal belief space. For example, when using AMDPs for robot motion, one 
might use MCL for tracking the belief over the robot’s pose. The algorithm 
policy_AMDP in Table 16.2 shows how to extract a policy action from the 
AMDP value function. It extracts the augmented state representation from 
the full belief in line 2, and then simply chooses the control action that maxi- 
mizes the expected value (line 3). 


Mathematical Derivation of AMDPs 


The derivation of AMDP is relatively straightforward, under the assumption 
that f is a sufficient statistic of the belief state b; i.e., the world is Markov 
relative to the state f(b). We start with an appropriate modification of the 
standard POMDP-style backup in Equation (15.2). Let f be the function that 
extracts the statistic b from b, hence b = f(b) for arbitrary beliefs b. Assuming 
that f is a sufficient statistic, the POMDP value iteration equation (15.2) can 
be defined over the AMDP state space 


Vr(b) = ^ max K u) + f vic p% | u,b) db! 
where b refers to the low-dimensional statistic of b defined in (16.4). Here 
Vr_1(b’) and Vr (b) are realized through look-up tables. 


This equation contains the probability p(b' | u, b), which needs further ex- 
planation. Specifically, we have 


pË Jub) = " p(B | u,b) p | 5) db 
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J [oe | z, u,b) p(z | b) p(b | b) dz db 
J [PE = (86.20 vl |) pto) dz ab 


= f [oë - 88.2) [5612 


] | u, £) b(x) dz da! dz p(b | b) db 


This transformation exploited the fact that the posterior belief b/ is uniquely 
determined once we know the prior belief b, the control u, and the mea- 
surement z. This same "trick" was already exploited in our derivation of 
POMDPs. It enabled us to replace the distribution over posterior beliefs by 
the Bayes filter result B(b,u,z). In the augmented state space, we there- 
fore could replace p(b’ | z,u,b) by a point-mass distribution centered on 
f(B(b,u, 2)). 

Our learning algorithm in Table 16.2 approximates this equation through 
Monte Carlo sampling. It replaced each of the integrals by a sampler. The 
reader should take a moment to establish the correspondence: Each of the 
nested integrals in (16.8) maps directly to one of the sampling steps in Ta- 
ble 16.2. 

Along the same lines, we can derive an expression for the expected payoff 
r(b, u): 


r(b,u) 


fre u) p(b | b) db 
J J r(x,u) b(x) dz p(b | b) db 


Once again, the algorithm AMDP_value_iteration approximates this inte- 
gral using Monte Carlo sampling. The resulting learned payoff function 
resides in the lookup R. The value iteration backup in lines 20-26 of 
AMDP value iteration is essentially identical to the derivation of MDPs. 
As noted above, this Monte Carlo approximation is only legitimate when 
b is a sufficient statistics of b and the system is Markov with respect to b. In 
practice, this is usually not the case, and proper sampling if augmented states 
would therefore have to be conditioned on past actions and measurements— 
a nightmare! The AMDP algorithm ignores this, by simply generating b’s 
with f(b) = b. Our example above involved a symmetric Gaussian whose 
parameters matched b. An alternative—which deviates from the original 
AMDP algorithm but would be mathematically more sound—would involve 
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Figure 16.1 Examples of robot paths in a large, open environment, for two different 
configurations (top row and bottom row). The diagrams (a) and (c) show paths gen- 
erated by a conventional dynamic programming path planner that ignores the robot's 
perceptual uncertainty. The diagrams (b) and (d) are obtained using the augmented 
MDP planner, which anticipates uncertainty and avoids regions where the robot is 
more likely to get lost. Courtesy of Nicholas Roy, MIT. 


simulation of entire traces of belief states using the motion and measure- 
ment models, and using subsequent pairs of simulated belief states to learn 
P and R. Below, when discussing MC-POMDPs, we will encounter such a 
technique. MC-POMDPs sidestep this issue by using simulation to generate 
plausible pairs of belief states. 
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Figure 16.2 Performance comparison of MDP planning and Augmented MDP plan- 
ning. Shown here is the uncertainty (entropy) at the goal location as a function of the 
sensor range. Courtesy of Nicholas Roy, MIT. 


Application to Mobile Robot Navigation 


The AMDP algorithm is highly practical. In the context of mobile robot nav- 
igation, AMDPs enable a robot to consider its general level of “confusion” 
in its action choice. This pertains not just to the momentary uncertainty, but 
also future expected uncertainties a robot may experience through the choice 
of its actions. 

Our example involves a robot navigating in a known environment. It was 
already given as an example in the introduction to this book; see Figure 1.2 
on page 7. Clearly, the level of confusion depends on where the robot nav- 
igates. A robot traversing a large featureless area is likely to gradually lose 
information as to where it is. This is reflected in the conditional probabil- 
ity p(b' | u,b), which with high likelihood increases the entropy of the belief 
in such areas. In areas populated with localization features, e.g., near the 
walls with distinctive features, the uncertainty is more likely to decrease. 
The AMDP anticipates such situations and generates policies that minimize 
the time of arrival while simultaneously maximizing the certainty at the time 
of arrival at a goal location. Since the uncertainty is an estimate of the true 
positioning error, it is a good measure for the chances of actually arriving at 
the desired location. 
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Figure 16.3 The policy computed using an advanced version of AMDP with a 
learned state representation. The task is to find an intruder. The gray particles are 
drawn from the distribution of where the person might be, initially uniformly dis- 
tributed in (a). The black dot is the true (unobservable) position of the person. The 
open circle is the observable position of the robot. This policy succeeds with high 
likelihood. Courtesy of Nicholas Roy, MIT, and Geoffrey Gordon, CMU. 
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Figure 16.1 shows example trajectories for two constellations (two differ- 
ent start and goal locations). The diagrams on the left correspond to a MDP 
planner, which does not consider the robot's uncertainty. The augmented 
MDP planner generates trajectories like the ones shown on the right. In Fig- 
ures 16.1a&b, the robot is requested to move through a large open area, ap- 
proximately 40 meters wide. The MDP algorithm, not aware of the increased 
risk of getting lost in the open area, generates a policy that corresponds to 
the shortest path from the start to the goal location. The AMDP planner, in 
contrast, generates a policy that stays close to the obstacles, where the robot 
has an increased chance of receiving informative sensor measurements at the 
expense of an increased travel time. Similarly, Figure 16.1c&d considers a sit- 
uation where the goal location is close to the center of the featureless, open 
area. Here the AMDP planner recognizes that passing by known objects re- 
duces the pose uncertainty, increasing the chances of successfully arriving at 
the goal location. 

Figure 16.2 shows a performance comparison between the AMDP naviga- 
tion strategy and the MDP approach. In particular, it depicts the entropy of 
the robot's belief b at the goal location, as a function of the sensor character- 
istics. In this graph, the maximum perceptual range is varied, to study the 
effect of impoverished sensors. As the graph suggests, the AMDP has sig- 
nificantly higher chances of success. The difference is largest if the sensors 
are very poor. For sensors that have a long range, the difference ultimately 
disappears. The latter does not come as a surprise, since with good range 
sensors the amount of information that can be perceived is less dependent 
on the specific pose of the robot. 

The feature to anticipate and avoid uncertainty has led to the name coastal 
navigation, for the application of AMDPs to robot navigation. This name in- 
dicates the resemblance to ships, which, before the advent of satellite-based 
global positioning, often stayed close to the coastline so as to not lose track 
of their location. 

We close our discussion of AMDPs by noting that the choice of the statistic 
f has been somewhat arbitrary. It is possible to add more features as needed, 
butat the obvious increase of computational complexity. Recent work has led 
to algorithms that learn statistics f, using non-linear dimensionality reduc- 
tion techniques. Figure 16.3 shows the result of such a learning algorithm, 
applied to the problem of clearing a building from a moving intruder. Here 
the learning algorithm identifies a 6-dimensional state representation, which 
captures the belief of the robot for any plausible pursuit strategy. The gray 
particles represent the robot's belief about the intruder's location. As this 
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example illustrates, AMDPs with a learned state representation succeed in 
generating quite a sophisticated strategy: The robot first clears part of the 
corridor, but always stays close enough to the room on the top that an in- 
truder cannot escape from there. It then clears the room, in time short enough 
to prevent an intruder to pass by the corridor undetected. The robot finally 
continues its pursuit in the corridor. 


Monte Carlo POMDPs 


Using Particle Sets 


The final algorithm discussed in this chapter is the particle filter solution to 
POMDPs, called MC-POMDP, which is short for Monte Carlo POMDP. MC- 
POMDPs acquire a value function that is defined over particle sets. Let V be 
a particle set representing a belief b. Then the value function is represented 
as a function 


V:X-—m*xt 


This representation has a number of advantages, but it also creates a number 
of difficulties. A key advantage is that we can represent value functions over 
arbitrary state spaces. In fact, among all the algorithms discussed thus far, 
MC-POMDPs are the only ones that do not require a finite state space. Fur- 
ther, MC-POMDPs use particle filters for belief tracking. We have already 
seen a number of successful particle filter applications. MC-POMDPs extend 
particle filters to planning and control problems. 

The primary difficulty in using particle sets in POMDPs pertains to the 
representation of the value function. The space of all particle sets of any 
given size M is M-dimensional. Further, the probability that any particle set 
is ever observed twice is zero, due to the stochastic nature of particle gener- 
ation. As a result, we need a representation for V that can be updated using 
some particle set, but then provides a value for other particle sets, which 
the MC-POMDP algorithm never saw before. In other words, we need a 
learning algorithm. MC-POMDP use a nearest neighbor algorithm using locally 
weighted interpolation when interpolating between different beliefs. 


The MC-POMDP Algorithm 


Table 16.3 lays out the basic MC-POMDP algorithm. The MC-POMDP al- 
gorithm required a number of nested loops. The innermost loop, in lines 6 
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Algorithm MC-POMDP(,, V): 


repeat until convergence 
sample x ~ b(x) / / initialization 


initialize A with M samples of b(x) 


repeat until episode over 
for all control actions u do / / update value function 

Q(u) — 0 

repeat n times 
select random x € X 
sample z' ~ p(z' | u, x) 
sample z ~ p(z | x’) 
X' = Particle_filter(¥, u, z) 


Q(u) = Qu) += 7 [r(z,u)  V() 


endrepeat 
endfor 
V(X) = max Q(u) // update value function 
u* = argmax Q(u) // select greedy action 
sample z' ~ p(z' | u, £) // simulate state transition 


sample z ~ p(z | x’) 

X' = Particle_filter(V,u,z) // compute new belief 

setz =a2';X = X' // update state and belief 
endrepeat 


endrepeat 


return V 





Table 16.3 The MC-POMDP algorithm. 
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through 16 in Table 16.3, updates the value function V for a specific belief 
X. It does so by simulating for each applicable control action u, the set of 
possible successor beliefs. This simulation takes place in lines 9 through 12. 
From that, it gathers a local value for each of the applicable actions (line 13). 
The value function update takes place in line 16, in which V is simply set to 
the maximum of all Q,,’s. 

Following this local backup is a step in which MC-POMDPs simulate the 
physical system, to generate a new particle set X. This simulation takes place 
in lines 17 through 21. In our example, the update always selects the greedy 
action (line 17); however, in practice it may be advantageous to occasionally 
select a random action. By transitioning to a new belief X, the MC-POMDP 
value iteration performs the update for a different belief state. By iterating 
through entire episodes (outer loops in lines 2 through 5), the value function 
is eventually updated everywhere. 

The key open question pertains to representation of the function V. MC- 
POMDP uses a local learning algorithm reminiscent of nearest neighbor. This 
algorithm grows a set of reference beliefs A; with associated values V;. When 
a query arrives with a previously unseen particle set Vguery, MC-POMDP 
identifies the K "nearest" particle sets in its memory. To define a suitable 
concept of nearness for particle sets requires additional assumptions. In the 
original implementation, MC-POMDP convolves each particle with a Gaus- 
sian with small, fixed covariance, and then measures the KL-divergence be- 
tween the resulting mixtures of Gaussians. Leaving details aside, this step 
makes it possible to determine K nearest reference particle sets A,,..., Xg, 
with an associated measure of distance, denoted d;,...,dy (we note that KL- 
divergence is not a distance in the technical sense, since it is asymmetric). The 


value of the query set Xquery, is then obtained through the following formula 


with 7 — ee 4] 2 Here X; is the k-th reference belief in the set of K 
nearest neighbors, and dx is the associated distance to the query set. This 
interpolation formula, known as Shepard's interpolation, explains how to cal- 
culate V (X") in line 13 of Table 16.3. 

The update in line 16 involves an implicit case differentiation. If the ref- 
erence set contains already K particle sets whose distance is below a user- 
defined threshold, the corresponding V-values are simply updated in pro- 
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portion to their contribution in the interpolation: 
1 

Ve — Wtan P (max Q(u) — V&) 
k u 


where a is a learning rate. The expression max, Q(u) is the “target” value 
for the function V, and 7 dr is the contribution of the k-th reference particle 
set under Shepard's interpolation formula. 

If there are less than K particle sets whose distance falls below the thresh- 
old, the query particle set is simply added into the reference set, with the 
associated value V — max, Q(u). In this way, the set of reference particle 
sets grow over time. The value of K and the user-specified distance threshold 
determine the smoothness of the MC-POMDP value function. In practice, se- 
lecting appropriate values will take some thought, since it is easy to exceed 
the memory of an ordinary PC with the reference set, when the threshold is 
chosen too tightly. 


Mathematical Derivation of MC-POMDPs 


The MC-POMDP algorithm relies on a number of approximations: the use 
of particle sets constitutes one such approximation. Another one is the lo- 
cal learning algorithm for representing V, which is clearly approximate. A 
third approximation is due to the Monte Carlo backup step of the value func- 
tion. Each of these approximations jeopardizes convergence of the basic al- 
gorithm. 

The mathematical justification for using particle filters was already pro- 
vided in Chapter 4. The Monte Carlo update step follows from the general 
POMDP update Equation (15.43) on page 532, which is restated here: 


Vp(b = y max [ro u) + / Vr_1(B(b, u, z)) p(z | u,b) dz 
The Monte Carlo approximation is now derived in a way entirely analogous 


to our AMDP derivation. We begin with the measurement probability p(z | 
u, b), which resolves as follows: 


p(z|uwb) = J / plz | a) plo! | u x) b(m) dz da! 


Similarly, we obtain for r(b, u): 


r(b,u) = [rev b(x) dx 
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This enables us to re-write (16.13) as follows: 


V(b) = y max |f u) b(z) dz 


«f vac (b, u, z)) [f [» (z | 2’) p | usa) br) de de | de 


E vss (m, u) + Vr 1(B(b,u, z)) p(z | x’) p(z' | u,z)] 
(x) da dx’ dz 


The Monte Carlo approximation to this integral is now a multi-variable sam- 
pling algorithm, which requires us to sample x ~ b(x), z' ~ p(x" | u,x) and 
z ~ p(z | 2’). Once we have z, x’, and z, we can compute B(b, u, z) via 
the Bayes filter. We then compute Vr. 1(B(b, u, z)) using the local learning 
algorithm, and r(x, u) by simply looking it up. We note that all these steps 
are implemented in 7 through 14 of Table 16.3, with the final maximization 
carried out in line 16. 

The local learning algorithm, which plays such a central role in MC- 
POMDPs, may easily destroy any convergence we might otherwise obtain 
with the Monte Carlo algorithm. We will not attempt to characterize the con- 
ditions under which local learning may give accurate approximations, but 
instead simply state that care has to be taken setting its various parameters. 


Practical Considerations 


From the three POMDP approximations provided in this chapter, MC- 
POMDP is the least developed and potentially the least efficient one. Its 
approximation relies on a learning algorithm for representing the value func- 
tion. Implementing an MC-POMDP algorithm can be tricky. A good under- 
standing of the smoothness of the value function is required, as is the number 
of particles one seeks to employ. 

The original implementation of the MC-POMDP algorithm led to the re- 
sults shown in Figure 16.4. A robot, shown in Figure 16.4a, is placed near a 
graspable object located on the floor near the robot, which it can detect using 
a camera. However, initially the object is placed outside the robot's percep- 
tual field. A successful policy will therefore exhibit three stages: A search 
stage, in which the robot rotates until it senses the object; a motion stage in 
which the robot centers itself relative to the object so that it can grasp it; and a 
final grasping action. The combination of active perception and goal-directed 
behavior make this a relatively challenging probabilistic control problem. 
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Figure 16.4 A robotic find and fetch task: (a) The mobile robot with gripper and 
camera, holding the target object. (b) 2-D trajectory of three successful policy execu- 
tions, in which the robot rotates until it sees the object, and then initiates a successful 
grasp action (c) success rate as a function of number of planning steps, evaluated in 
simulation. 


Figure 16.4b shows example episodes, in which the robot turned, moved, 
and grasped successfully. The trajectories shown there are projected motion 
directories in 2-D. Quantitative results are shown in Figure 16.4c, which plots 
the success rate as a function of update iterations of the MC-POMDP value 
iteration. 4,000 iterations of value backup require approximately 2 hours 
computation time, on a low-end PC, at which point the average performance 
levels off at 80%. The remaining 20% failures are largely due to configura- 
tions in which the robot fails to position itself to grasp the object—in part a 
consequence of the many approximations in MC-POMDPs. 
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Summary 


In this section, we introduced three approximate probabilistic planning and 
control algorithms, with varying degrees of practical applicability. All three 
algorithms relied on approximations of the POMDP value function. How- 
ever, they differed in the nature of their approximations. 


The QMDP framework considers uncertainty only for a single action 
choice: It is based on the assumption that after the immediate next con- 
trol action, the state of the world suddenly becomes observable. The full 
observability made it possible to use the MDP-optimal value function. 
OMDP generalizes the MDP value function to belief spaces through the 
mathematical expectation operator. As a result, planning in QMDPs is as 
efficient as in MDPs, but the value function generally overestimates the 
true value of a belief state. 


Extensions of the OMDP algorithm combine the MDP-optimal value 
function with a sequence of POMDP backups. When combined with 
T POMDP backup steps, the resulting policy considers information- 
gathering actions within the horizon of T, and then relies on the QMDP 
assumption of a fully observable state. The larger the horizon T' is, the 
closer the resulting policy to the full POMDP solution. 


The AMDP algorithm pursues a different approximation: It maps the be- 
lief into a lower-dimensional representation, over which it then performs 
exact value iteration. The "classical" representation consists of the most 
likely state under a belief, along with the belief entropy. With this repre- 
sentation, AMDPs are like MDPs with one added dimension in the state 
representation that measures the global degree of uncertainty of the robot. 


To implement an AMDP, it becomes necessary to learn the state transition 
and the reward function in the low-dimensional belief space. AMDPs 
achieve this by an initial phase, in which statistics are cached into look- 
up tables, representing the state transition and reward function. Thus, 
AMDPs operate over a learned model, and are only accurate to the degree 
that the learned model is accurate. 


AMDPs applied to mobile robot navigation in known environments is 
called coastal navigation. This navigation technique anticipates uncer- 
tainty, and selects motion that trades off overall path length with the 
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uncertainty accrued along a path. The resulting trajectories differ signifi- 
cantly from any non-probabilistic solution: A “coastally” navigating robot 
stays away from areas in which chances of getting permanently lost are 
high. Being temporarily lost is acceptable, if the robot can later re-localize 
with sufficiently high probability. 


* The MC-POMDP algorithm is the particle filter version of POMDPs. It 
calculates a value function defined over sets of particles. To implement 
such a value function, MC-POMDPs had to resort to a local learning tech- 
nique, which used a locally weighted learning rule in combination with a 
proximity test based on KL-divergence. MC-POMDPs then apply Monte 
Carlo sampling to implement an approximate value backup. The result- 
ing algorithm is a full-fledged POMDP algorithm whose computational 
complexity and accuracy are both functions of the parameters of the learn- 
ing algorithm. 


The key lesson to take away from this chapter is that there exists a number of 
approximations whose computational complexity is much closer to MDPs, 
but that still consider state uncertainty. No matter how crude the approx- 
imation, algorithms that consider state uncertainty tend to be significantly 
more robust than algorithms that entirely ignore state uncertainty. Even a 
single new element in the state vector—which measures global uncertainty 
in a one-dimensional way—can make a huge difference in the performance 
of a robot. 


Bibliographical Remarks 


The literature on approximate POMDP problem solving was already extensively discussed in 
the last chapter (15.7). The OMDP algorithm described in this chapter is due to Littman et al. 
(1995). The AMDP algorithm for a fixed augmented state representation was developed by Roy 
et al. (1999). Later, Roy et al. (2004) extended it to a learned state representation. Thrun (2000a) 
devised the Monte Carlo POMDP algorithm. 


Exercises 
1. In this question, you are asked to design an AMDP that solves a simple 


navigation problem. Consider the following environment with 12 discrete 
states. 
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Initially, the robot is placed at a random location, chosen uniformly 
among all 12 states. Its goal is to advance to state 7. At any point in time, 
the robot goes north, east, west, or south. Its only sensor is a bumper: 
When it hits an obstacle, the bumper triggers and the robot does not 
change states. The robot cannot sense what state it is in, and it cannot 
sense the direction of its bumper. There is no noise in this problem, just 
the initial location uncertainty (which we will assume to be uniform). 


(a) How many states will an AMDP minimally have to possess? Describe 
them all. 


(b) How many of those states are reachable from the initial AMDP state? 
Describe them all. 


(c) Now assume the robot starts at state 2 (but it still does not know, so its 
internal belief state will be different). Draw the state transition diagram 
between all AMDP states that can be reached within four actions. 


(d) For this specific type problem (noise-free sensors and robot motion, 
finite state, action, and measurement space), can you think of a more 
compact representation than the one used by AMDPs, which is still 
sufficient to find the optimal solution? 


(e) For this specific type problem (noise-free sensors and robot motion, 
finite state, action, and measurement space), can you craft a state space 
for which AMDPs will fail to find the optimal solution? 


2. In the previous chapter, we learned about the tiger problem (Exercise 1 on 
page 544). What modification of this problem will enable QMDP to come 
up with an optimal solution? Hint: There are multiple possible answers. 


3. In this question, we would like you to determine the size of the belief state 
space. Consider the following table: 
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problem number sensors state initial 
number of states sensors transition | state 
#1 3 perfect noise-free | known 
#2 3 perfect noisy known 
#3 3 noise-free | noise-free | unknown (uniform) 
#4 3 noisy noise-free | known 
#5 3 noisy noise-free | unknown (uniform) 
#6 3 none noise-free | unknown (uniform) 
#7 3 none noisy known 
#8 1-dim continuum | perfect noisy known 
#9 1-dim continuum | noisy noisy known 
#10 2-dim continuum | noisy noisy unknown (uniform) 





A perfect sensor always provides the full state information. A noise-free 
sensor may provide partial state information, but it does so without any 
randomness. A noisy sensor may be partial and is also subject to noise. 
A noise-free state transition is deterministic, whereas a stochastic state 
transition is called noisy. Finally, we only distinguish two types of initial 
conditions, one in which the initial state is known with absolute certainty, 
and one in which it is entirely unknown and the prior over states is uni- 
form. 


Your question: What is the size of the reachable belief space for all 10 
problems above? Hint: It may be finite or infinite, and in the infinite case 
you should be able to tell what dimension the belief state has. 


4. We want you to brainstorm about the failure modes of an AMDP plan- 
ner. In particular, the AMDP learns state transition and reward functions. 
Brainstorm what can go wrong with this when such learned models are 
used for value iteration. Identify at least three different types of problems, 
and discuss them in detail. 
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Introduction 


This chapter focuses on the robotic exploration problem. Exploration is the 
problem of controlling a robot so as to maximize its knowledge about the 
external world. In many robotics applications, the primary purpose of the 
robotic device is to provide us, the users, with information. Some environ- 
ments may be plainly inaccessible to people. In others, it may be uneco- 
nomical to send people, and robots may be the most economical means of 
acquiring information. The exploration problem is paramount in robotics. 
Robots have explored abandoned mines, nuclear disaster sites, even Mars. 

Exploration problems come in many different flavors. For example, a robot 
may seek to acquire a map of a static environment. If we represent the envi- 
ronment by an occupancy grid map, the exploration problem is the problem 
of maximizing the cumulative information we have about each grid cell. A 
more dynamic version of the problem may involve moving actors: For ex- 
ample, a robot might have the task to find a person in a known environment, 
as part of a pursuit evasion problem. The goal might be to maximize the infor- 
mation about the person’s whereabouts, and finding a person might require 
exploring the environment. However, since the person might move, the ex- 
ploration policy may have to explore areas multiple times. A third explo- 
ration problem arises when a robot seeks to determine its own pose during 
localization. This problem is commonly called active localization, and the goal 
is to maximize the information about the robot’s own pose. In robotic manip- 
ulation, an exploration problem arises when a manipulator equipped with a 
sensor faces an unknown object. As this brief discussion suggests, explo- 
ration problems arise virtually everywhere in robotics. 

At first glance, one might conclude that the exploration problem is already 
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fully subsumed by the POMDP framework discussed in previous chapters. 
As we have shown, POMDPs naturally engage in information gathering. To 
turn a POMDP into an algorithm whose sole goal it is to maximize informa- 
tion, all we have to do is supply it with an appropriate payoff function. One 
appropriate choice is the information gain, which measures the reduction in 
entropy of a robot's belief as a function of its actions. With such a payoff 
function, POMDPs solve the exploration problem. 

However, exploring using the POMDP framework is often not such a good 
idea. This is because in many exploration problems, the number of unknown 
state variables is huge, as is the number of possible observations. Consider, 
for example, the problem of exploring an unknown planet. The number of 
variables needed to describe the surface of a planet will be enormous. So will 
be the set of possible observations a robot may make. We already found that 
in the general POMDP framework the planning complexity grows doubly 
exponential in the number of observations (in the worst case); hence calculat- 
ing a value function is plainly impossible. In fact, given the huge number of 
possible values for the unknown state variables in exploration, any algorithm 
that integrates over all possible such values will inevitably be inapplicable to 
high dimensional exploration problems, simply for computational reasons. 

This chapter discusses a family of practical algorithms that can solve high- 
dimensional exploration problems. The techniques discussed here are all 
greedy. Put differently, their look-ahead is limited to one exploration action. 
However an exploration action can involve a sequence of control actions. 
For example, we will discuss algorithms that select a location anywhere in 
the map to explore; moving there is considered one exploration action. The al- 
gorithms discussed here also approximate the knowledge gain upon sensing, 
so as to reduce the computation involved. 

This chapter is organized as follows 


* We begin with the general definition of information gain in exploration, 
for the discrete and the continuous case. We define the basic greedy ex- 
ploration algorithm that selects actions so as to maximize its information 
gain. 


* We then analyze a first special case of robotic exploration: active localiza- 
tion. Active localization pertains to the choice of actions while globally 
localizing a robot. The application of our greedy exploration algorithm, 
under an appropriate definition of the action space yields a practical so- 
lution to this problem. 
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* We also consider the problem of exploration in occupancy grid mapping. 
We derive a popular technique called frontier-based exploration, in which a 
robot moves to its nearest frontier. 


* Furthermore we describe an extension of our exploration algorithm to 
multi-robot systems and show how a team of mobile robots can be con- 
trolled so as to efficiently explore an unknown environment. 


* Finally we consider the application of exploration techniques to the SLAM 
problem. We show how to control a robot so as to minimize the uncer- 
tainty, using FastSLAM as our example SLAM approach. 


The exploration techniques described below have been widely used in the 
literature and in a variety of practical applications. They also span a number 
of different representations and robotics problems. 


Basic Exploration Algorithms 


Information Gain 


The key to exploration is information. We already encountered a number of 
uses for information in probabilistic robotics. In the context of exploration, 
we define the entropy H,(x) of a probability distribution p as the expected 
information E|— log p] 


Hz) = - | ple) logp(e) de or -5 ple) logpiz) 

z 
The entropy was already briefly discussed in Chapter 2.2 of this book. H, (x) 
is at its maximum if p is a uniform distribution. It is at its minimum when 
p is a point-mass distribution; however, in certain continuous cases such as 
Gaussians we might never reach full confidence. 

Conditional entropy is defined as the entropy of a conditional distribution. 
In exploration, we seek to minimize the expected entropy of the belief after 
executing an action, hence it is natural to condition on the measurement z 
and the control u that define the belief state transition. 

Holding with the notation introduced before, let us use B(b, z, u) to denote 
the belief after executing control u and observing z under the belief b. The 
conditional entropy of the state x’ after executing action u and measuring z 
is given by 


Hye" | z,u) = -J B(b, z ,u)(x') log B(b, z, u) (a^) da’ 
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Here B(b, z, u) is computed using the Bayes filter. In robotics, we only have a 
choice over the control action u, we cannot pick z. Consequently, we consider 
the conditional entropy of the control u, with the measurement integrated 


out: 
Hy(a' | u) E,|[H(z' | z, u)] 


f [me | z,u) p(z | x) p(x | ux) b(x) dz da’ dx 


Q 


Notice that this is only an approximation, as the final expression inverts the 
oder of a summation and a logarithm. The information gain associated with 
action u in belief b is thus given by the difference 


(u) = H)(«) - EHy(z | z,u)] 


Greedy Techniques 


The expected information gain lets us phrase the exploration problem as a 
decision-theoretic problem of the type addressed in previous chapters. In 
particular, let r(x, u) be the cost of applying control action u in state x; here 
we assume r(z,u) < 0 to keep our notation consistent. Then the optimal 
greedy exploration for the belief b maximizes the difference between the in- 
formation gain and the costs, weighted by a factor o. 


"(b = argmax a (Hy(z) — E;[Hw(z' | z,u)]) + fre b(x) dx 


expected information gain expected costs 
The factor o relates information to the cost of executing u. It specifies the 
value a robot assigns to information, which measures the price it is willing 
to pay in terms of costs for obtaining information. 

Equation (17.5) resolves to 


"(b = argmax —a E,|Hy(z! | z,u)] + [re b(x) dx 
= argmax /ro —a« f [ 5e | z, u) plz | 2’) 
p(x | u, x) dz da] b(x) dx 


In short, to understand the utility of the control u, we need to compute the 
expected entropy after executing u and observing. This expected entropy is 
obtained by integrating over all possible measurements z that we might be 
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Figure 17.1 Unpredictability of the exploration problem: A robot in (a) might an- 
ticipate a sequence of three controls, but whether or not this sequence is executable 
depends on the things the robot finds out along the way. Any exploration policy has 
to be highly reactive. 


receiving, times their probability. It is translated into utility via the constant 
a. We subsequently subtract the expected cost of executing action r. 

Most exploration techniques employ this greedy policy, which is indeed 
optimal at horizon 1. The reason for relying on greedy techniques is because 
of the enormous branching factor in exploration, which renders multi-step 
planning impossible. The large branching factor is due to the very nature 
of the exploration problem. The goal of exploration is to acquire new infor- 
mation, but once such information has been acquired, the robot is in a new 
belief state, hence has to adjust its policy. Hence, measurements are inher- 
ently unpredictable. 

Figure 17.1 illustrates such a situation. Here a robot has mapped two 
rooms and part of a corridor. At this point, the optimal exploration might in- 
volve exploring one corridor, and an appropriate sequence of actions might 
correspond to the one shown in Figure 17.1a. However, whether or not such 
an action sequence is executable is highly unpredictable. For example, the 
robot may find itself in a dead end as illustrated in Figure 17.1b where the 
anticipated action sequence is not applicable. 


Monte Carlo Exploration 


The algorithm Monte_Carlo_exploration is a simple probabilistic explo- 
ration algorithm. Table 17.1 depicts this Monte Carlo approximation of the 
greedy exploration rule in Equation (17.6). This algorithm simply replaces 
the integrals in the greedy approach by sampling. In line 4, it samples a 
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A 


Algorithm Monte Carlo exploration(b): 


set p,, = 0 for all actions u 


fori = 1 to N do 
sample x ~ b(x) 
for all control action u do 
sample z' ~ p(z' | u, x) 
sample z ~ p(z | x’) 
bv = Bayes filter(b, z, u) 
Pu = Put r(z,u) — a Ay (2’) 
endfor 
endfor 


mi. j 
= C 


A 
m 


return argmax pu 
u 








Table 17.1 A Monte Carlo implementation of the greedy exploration algorithm, 
which chooses actions by maximizing the trade-off between information gain and 
cost. 


state x from the momentary belief b, followed by sampling the next state 2’ 
and the corresponding observation z’. A new posterior belief is then calcu- 
lated in line 8, and its entropy-cost trade-off is cached away in line 9. Line 
12 then returns the action whose Monte Carlo information gain-cost value is 
the highest. 

In general, the greedy Monte Carlo algorithm may still require substantial 
time, to a point that it becomes impractical. The main complication arises 
from the sampling of measurements z. When exploring an unknown en- 
vironment during mapping, the number of possible measurements can be 
huge. For example, for a robot equipped with 24 ultrasound sensors that 
each report one byte of range data, the number of potential sonar scans ob- 
tained at a specific location is 25674. Clearly, not all of those measurements 
are plausible, but the number of plausible measurements is at least as large 
as the number of plausible local maps. And for any realistic mapping prob- 
lem, the number of possible maps is immense! Below, we will consider ex- 
ploration techniques that side-step this integration through a closed-form 
analysis of the expected information gain. 
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Multi-Step Techniques 


In situations where the measurement and the state spaces are confined, it 
may be possible to generalize the principle of information gathering to any fi- 
nite horizon T > 1. Suppose we would like to optimize the information-cost 
trade-off at horizon T. This is achieved by defining the following exploration 
payoff function: 


frenu) b(xz) da, ift «T 


Texp (be, Ut) = 
a Ay, (x+) ift=T 


Under this payoff function, a POMDP planner then finds a control policy 
that minimizes the entropy of the final belief br minus the cost of achieving 
this belief, scaled appropriately. All of the previously discussed POMDP 
techniques are applicable. 

The reader may notice the similarity to the augmented MDP discussed in 
the previous chapter. The difference here is that we only specify the payoff 
function, not the belief state representation. Because most exploration prob- 
lems become computationally intractable under the general POMDP model, 
we will not pursue this approach any further in this book. 


Active Localization 


The simplest case of exploration occurs when estimating the state of a low- 
dimensional variable. Active localization is such a problem: Here we seek 
information about the robot pose x+, but we are given a map of the environ- 
ment. Active localization is particularly interesting during global localiza- 
tion, since here the control choice can have an enormous impact on the in- 
formation gain. In many environments, wandering around aimlessly makes 
global localization hard, whereas moving to the right location can yield very 
fast localization. 

Such an environment is depicted in Figure 17.2a. Here the robot is placed 
within a symmetric corridor, and no matter how long it explores, it cannot 
resolve its pose without leaving the corridor and moving through one of 
the open doors. Thus, any solution to the active localization problem must 
ultimately steer the robot out of the corridor and into one of the rooms. 

Active localization can be solved greedily, along the lines of the algorithm 
just presented. The key insight pertains to the choice of action representa- 
tion. Clearly, if actions are defined as low-level control actions as in much of 
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(a) Environment with an example posterior. (b) Effect of exploration action. 





Figure 17.2 (a) Active localization in a symmetric environment: Shown here is an 
environment with a symmetric corridor, but an asymmetric arrangement of rooms, 
labeled A, B, and C. This figure also shows an exploration path. (b) An example of 
the exploration action "go backward 9 meters, go left 4 meters." If the robot's pose 
posterior possesses two distinct modes as shown here, the actual control in global 
world coordinates might lead to two different places. 


this book, any viable exploration plan would have to involve a long chain of 
controls, before any pose ambiguity can be resolved. To solve the active local- 
ization problem through greedy exploration, we need a definition of actions 
through which the robot can gather pose information greedily. 

One possible solution is to define an exploration action through target loca- 
tions, expressed in the robot's local coordinate frame. For example, “move to 
the point Ax — —9m and Ay — 4m relative to the robot's local coordinate frame" 
can be considered an action, so long as we can devise a low level navigation 
module that can map this action back into low-level controls. Figure 17.2b 
visualizes the potential effect of this action in global world coordinates. In 
this example, the posterior possesses two modes, hence this action can carry 
the robot to two different locations. 

The definition of relative motion actions makes it possible to solve the ac- 
tive localization problem through an algorithm that is essentially the same 
as the greedy exploration algorithm in Table 17.1. We will describe this algo- 
rithm through an example. Figure 17.3a shows the active localization path 
along with a number of labeled places. We begin in the middle of localiza- 
tion: Figure 17.3b shows the belief after moving from the location labeled "1" 
to the location labeled "2". This belief possesses six modes, each indicated 
by a circle in Figure 17.3b. For this belief, the expected occupancy in robot 
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(a) Path of the localizing robot 
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(b) Early belief distribution with six modes 








(d) Expected costs of motion 











(f) Gain plus costs (the darker, the better) 
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Figure 17.3 Illustration of active localization. This figure displays a number of aux- 
iliary functions for computing the optimal action, for a multi-hypothesis pose distri- 


bution. 
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(a) Belief distribution (b) Occupancy prob. in robot coordinates 





















































(c) Expected costs of motion (d) Exp. information gain in robot coordinates 
4C n 1 
TCU 
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(e) Gain plus costs (the darker, the better) (f) Final belief after active localization 























Figure 17.4 Illustration of active localization at some later point in time, for a belief 
with two distinct modes. 
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coordinates is shown in Figure 17.3c. This figure is simply obtained by su- 
perimposing the known occupancy grid map for each of the possible robot 
poses, weighted by the respective probability. Since the robot does not know 
its pose with certainty, it cannot know the occupancy of locations; hence the 
“fuzziness” of the expected cost map. However, with high likelihood the 
robot is in a corridor-shaped area that is traversable. 

While Figure 17.3c depicts the cost of being at a target location, we need the 
cost of moving to such a target location. We have already encountered an al- 
gorithm for computing such motion costs, along with the optimal path: value 
iteration (see Chapter 14). Figure 17.3d shows the result of value iteration, ap- 
plied to the map in Figure 17.3b as cost function. Here value is propagated 
from the robot outwards (as opposed to from the goal, as was the notion in 
Chapter 14). This makes it possible to calculate the cost of moving to any 
potential target location in this map. 

As this figure illustrates, there exists a large region near the robot where 
motion is safe; in fact, this region corresponds to the corridor, no matter 
where the robot actually is located. Moving out of this region and into one 
of the rooms incurs higher expected costs, since the validity of such a motion 
hinges on the exact location of the robot, which is unknown. 

For greedy exploration, we now need to determine the expected informa- 
tion gain. It can be approximated by placing the robot in a location, simulate 
possible range measurements, incorporate the result, and measure the in- 
formation after the Bayesian update. Repetition of this evaluation step for 
each possible location yields an expected information gain map. Figure 17.3e 
shows the result: the darker a location in this map, the more information it 
provides relative to the robot’s pose estimate. Obviously, any of the rooms 
will be most informative, as will be the far end of the corridors. Thus, from a 
pure information gathering perspective, the robot should seek to move into 
one of the rooms. 

Adding this cost map to the expected information gain leads to the plot in 
Figure 17.3f: the darker a target location, the better. While the outside rooms 
fare still high in this combined function, their utility has been reduced by the 
relatively high costs of moving there. At this point, areas at the end of the 
corridor score the highest. 

The robot now moves to the location of the highest combined value, which 
brings it to the outer area of the corridor that is still safe to travel. In Fig- 
ure 17.3a, this corresponds to a transition from the location marked “2” to 
the one marked “3”. 

The greedy active exploration rule is now reiterated. The belief at location 
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“3” is depicted in Figure 17.4a. Obviously, the previous exploration action 
had the effect of reducing the modes in the posterior, from six down to two. 
Figure 17.4b shows the new occupancy map, in robot-centric coordinates; 
and Figure 17.4c depicts the corresponding value function. The expected 
information gain is now uniformly high only in the rooms, as illustrated in 
Figure 17.4d. Figure 17.4e shows the combined gain-cost map. At this point, 
the cost of moving into any of the symmetrically open rooms is the lowest, 
hence the robot moves there, at which the ambiguity is largely resolved. One 
time step later, after another round of deliberation, the final belief is the one 
shown in Figure 17.4f. 

This greedy active localization algorithm is not without shortcomings. 
One shortcoming stems from its greediness: It cannot compose multiple ex- 
ploration actions that together maximize the knowledge gain. Another short- 
coming is the result of our action definition. The algorithm fails to consider 
the measurements that will be acquired while moving to a target location. In- 
stead, it treats each such motion as an open-loop control, in which the robot is 
unable to react to its measurements. Clearly, when faced with a closed door, 
a real robot can abandon a target point even before reaching it; however, no 
such provision is considered during planning. This explains the sub-optimal 
choice in our example above, where the robot explores the room labeled “B” 
before it explores the room labeled “A.” As a result, localization tends to take 
longer than theoretically necessary. Nevertheless, this algorithm performs 
well in practice. 


Exploration for Learning Occupancy Grid Maps 


Computing Information Gain 


Greedy exploration can also be applied in robotic mapping. Mapping prob- 
lems involve many more unknown variables than robot localization, hence 
we need a technique for calculating expected knowledge gain that scales to 
high-dimensional estimation problems. As we shall see, the “trick” for ex- 
ploration in occupancy grid maps is the very same trick that led us to define 
an efficient update algorithm for occupancy grid maps: We treat information 
gain as independent between different grid cells. 

Consider an occupancy grid map, such as the one shown in Figure 17.5a. 
Parts of this map are well-explored, such as the large free area in the center of 
the map, and the many walls and obstacles whose locations are well-known. 
Other parts remain unexplored, such as the large gray area outside the map. 
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(a) Occupancy grid map (b) Cell entropy 























(c) Explored and unexplored space (d) Value function for exploration 
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Figure 17.5 Example of the essential step in exploration for mapping. (a) shows a 
partial grid map; (b) depicts the map entropy; (c) shows the space for which we have 
zero information; and (d) displays the value function for optimal exploration. 
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(a) Map segment (b) Entropy (c) Exp. information gain 



































Figure 17.6 Map, entropy and expected information gain. This figure illustrates 
that with the appropriate scaling, entropy and expected information gain are nearly 
indistinguishable. 


Greedy exploration steers the robot to the nearest unexplored area, where the 
information gain is maximal. This raises the question as to how to compute 
gain. 

We will discuss three possible techniques. All three approaches have in 
common that they calculate the information gain per grid cell, and not as a 
function of the robot action. This conveniently leads to an information gain 
map, which is a 2-D map defined over the same grid as the original grid map. 
The difference between these techniques is the quality of the approximation. 


* Entropy. Computing the entropy of each cell is straightforward. Let us 
denote the i-th cell by m;, and its occupancy probability p; 2 p(m;). Then 
the entropy of the binary occupancy variable is given by the following 
sum: 


Hy(m;) = -—pilogpi — (1— pi) log(1— pi) 


Figure 17.5b depicts the entropy, for each of the cells in the map shown in 
Figure 17.5a. The brighter a location, the larger the entropy. Most of the 
central areas in the map exhibit low entropy, with the exception of a few 
areas near or inside obstacles. This matches our intuition, as most of the 
map is already well-explored. The outer areas possess high entropy, indi- 
cating that they might be good areas to explore. Thus, the entropy map 
indeed assigns high information values to places that remain unexplored. 


* Expected information gain. Technically, the entropy only measures the 
current information. It does not specify the information a robot would 
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gain through its sensors when in a vicinity of a cell. The calculation of 
the expected information gain is slightly more involved, and it requires 
additional assumptions on the nature of the information provided by the 
robot's sensors. 


Suppose our sensor measures with probability ptrue the correct occupancy, 
and it errs with probability 1 — ptrue. Then we would expect to measure 
"occupied" with the following probability: 


(17.9) p* = Ptrue Di + (1 "n Pius) (1 e pi) 


The standard occupancy grid update will then yield the new probabil- 
ity, which follows directly from the occupancy grid mapping algorithm 
discussed in Chapter 9: 


Ptrue Pi 
(17.10) pts 
Ptrue Pi + (1 — Ptrue) (1 d pi) 





The entropy of this posterior is now: 


十 
(17.11) Hj, (mi) 
Ptrue Pi 1 Ptrue Pi 
og 
Ptrue Pi T (1 RR Ptrue) (1 "e pi) Ptrue Pi T (1 EN Ptrue) (1 P pi) 
(1 m. Ptrue) (1 UR pi) log (1 NE Ptrue) (1 ERE pi) 
Ptrue Pi T (1 V Pirae) (1 zu pi) Ptrue Pi T (1 V Ptrue) (1 PE pi) 




















By analogy, our sensor will sense "free" with probability 
(17.12) D = Ptrue (1 a pi) + (1 n Ptrue) Pi 


in which case the posterior will become 


(1 Py Ptrue) Di 
(17.13) p = 
Ptrue (1 lb pi) 十 (1 n Ptrue) Di 





This posterior has the entropy 














(17.14) H7 (m;) 
zs (1 = Ptrue) Pi iba (1 — Ptrue) Pi 
Ptrue (1 — pi) + (1 — Dirue) Pi Ptrue (1 — pi) + (1 — Ptrue) Pi 
Dtrue (1 — pi) i Pisae (1 — pi) 
fuss (1 — pi) + (1 — Pirue) Pi Duud (L — pi) + A puse] Pi 
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Putting the previous equations together, we obtain the expected entropy 
after sensing: 


E|Hp(m;) = pt Hj(m) + p Ay (mi) 
Ptrue Pi 
Ptrue Pi + (1 — Pirue) (1 — pi) 
(1 — ptrue) (1 — pi) 
Ptrue Pi + (1 — Prue) (1 — pi) 
(1 = Ptrue) Pi 
Pius (E — pi) + (L— Paus) Pi 
(l= pug) Pi 
Ptrue (1 — pi) + (1 — Dirue) Pi 





= —Ptrue Pi log 





-(1 — Ptrue) (1 m pi) log 





zu t Ptrue) Pi log 





—Ptrue (1 t pi) log 





Following our definition in Equation (17.4), the expected information 
gain upon sensing the grid cell m; is simply the difference H,(m;) 一 
EH, (m,)]. 


So how much better is the expected information gain than just the entropy 
from which it was derived? The answer is: not much. Figure 17.6 plots 
the entropy in Panel (b), next to the expected information gain in Panel 
(c), all for the map segment shown in Panel (a). Visually, the entropy 
and the expected information gain are nearly indistinguishable, although 
the values are different. This “justifies” the common practice of using 
entropy—instead of the expected information gain—as a function to di- 
rect the exploration. 


Binary gain. The third approach is the simplest of all, and by far the most 
popular. A very crude approximation of the expected information gain is 
to mark cells that have been updated at least once as "explored," and all 
other cells ‘unexplored.’ Thus, the gain becomes a binary function. 


Figure 17.5c shows such a binary map: Only the outer white area promises 
any new information; the map interior is assumed to be fully explored. 
While this information gain map is clearly the crudest of all the approxi- 
mations discussed here, it tends to work well in practice, in that it pushes 
an exploring robot continuously into unexplored terrain. This binary map 
is at the core of a popular exploration algorithm called frontier-based explo- 
ration, in which the robot continuously moves to the nearest unexplored 
frontier of the explored space. 
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Propagating Gain 


The remaining question now pertains to the development of a greedy ex- 
ploration technique that utilizes these maps. As in our active localization 
example, this requires the definition of an appropriate exploration action. 

A simple yet effective definition of an exploration action involves moving to 
an x-y location along a minimum cost path, and then sensing all grid cells in 
a small circular diameter around the robot. Thus, each location in the map 
defines a potential exploration action. 

The computation of the best greedy exploration action is now easily done 
using value iteration. Figure 17.5d shows the resulting value function for 
the map shown in Figure 17.5a, and the binary information gain map in Fig- 
ure 17.5c. The value iteration approach assumes such a binary gain map: 
Gain can only be reaped at unexplored locations. 

The central value update is implemented by the following recursion: 


max; r(mi, m;) 十 Vr—i(m,;) if I(m;) =0 
Vr (mi) = 
I(m;) if I(m;) > 0 


Here V is the value function, j indexes over all grid cells adjacent to m;, r 
measures the cost of moving there (usually a function of the occupancy grid 
map), and J(m,) is the information we would gain in cell m;. The termina- 
tion condition J(m;) > 0 is only true for unexplored grid cells in the binary 
information gain map. 

Figure 17.5d shows such a value function after convergence. Here the 
value is highest near the open areas of the map, and lower in the interior 
of the map. The exploration technique now simply determines the optimal 
path by hill-climbing in this map. This path leads the robot directly to the 
nearest unexplored frontier. 

Clearly, this exploration technique is just a crude approximation. It entirely 
ignores the information acquired as the robot moves to a target location, in 
that it falsely assumes that no sensing takes place along the way. However, it 
tends to work well in practice. Figure 17.7 shows a value function and a map 
of an exploring robot. This map is historical: it is taken from the 1994 AAAI 
Mobile Robot Competition, which involved the acquisition of an environment 
map at high speeds. The robot was equipped with an array of 24 sonar sen- 
sors, which accounts for the relatively low accuracy of the map. The most 
interesting aspect of this figure pertains to the robot path: As can be seen 
in Figure 17.7b, initially the exploration is highly effective, with the robot 
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(a) Exploring value function 











(b) Exploration path 

















Figure 17.7 Illustration of autonomous exploration. (a) Exploration values V, com- 
puted by value iteration. White regions are completely unexplored. By following the 
gray-scale gradient, the robot moves to the next unexplored area on a minimum-cost 
path. The large black rectangle indicates the global wall orientation 0,41. (b) Actual 
path traveled during autonomous exploration, along with the resulting metric map. 
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Figure 17.8 (a) Urban robot for indoor and outdoor exploration. The urban robot's 
odometry happens to be poor. (b) Exploration path of the autonomously exploring 
robot, using the exploration techniques described in the text. 


roaming unexplored corridors. Later, however, the robot begins to alternate 
between different goal locations. Such alternate behavior is quite common 
for greedy exploration techniques, and most contemporary implementations 
provide additional mechanisms for avoiding such behavior. 

A second example is shown in Figure 17.8. The path of the robot, shown 
on the right, illustrates the efficiency of the greedy exploration algorithm. 


Extension to Multi-Robot Systems 


The gain-seeking exploration rule has frequently been extended to multi- 
robot systems, in which the robot seeks to acquire a map through cooperative 
exploration. In general, the speed-up when using K robots is linear. It can 
be super-unitary: IK robots tend to speed-up the time required for exploration 
by more than a factor of K when compared to one robot. Such super-unitary 
speed-up is due to the fact that a single robot might have to traverse many 
areas twice, once to go out and explore, and once to come back to explore 
some other part of the environment. With an appropriate number of robots, 
the return portion may become unnecessary, and the speed-up is closer to a 
factor of 2K. This factor of 2K is an upper bound for robots that can navigate 
easily in all directions. 

The key provision for multi-robot exploration pertains to coordination. In 
static exploration, this is easily accomplished through greedy task allocation 
techniques. Consider a situation in which K robots are placed in a partially 
explored map. The setup now is that there exists a number of frontier places 
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1: Algorithm multi robot exploration(m, z1,..., £g): 
2: for each robot k = 1 to K do 
3: Let m, be the grid cell that contains x, 
. . Jf co if m; zm, 
s UR { 0 if m; =m, 
5: do until Vp converged 
6: for all i do 
7: Vi (m;) — min (Vi(m;), r(m;, mj) + Vc(m;)] 
j 
8: endfor 
9: endwhile 
10: endfor 
11: compute the binary gain map m from the map m 
12: for each robot k = 1 to K do 
13: set goal, = ^ argmin Vi (mi) 
i such that m;=1 
14: for all cells m; in e-neighborhood of goal, 
15: set m; = 0 
16: endfor 
17: endfor 
18: return {goal,, goal,,..., goal; } 








Table 17.2 Multi-robot exploration algorithm. 


to explore, and we need an algorithm that assigns such robots to places in a 
way that greedily maximizes the overall exploration effect. 

The algorithm multi robot exploration in Table 17.2 is a very simple 
such algorithm. It computes for a set of K robots a set of K exploration 
goals, which are locations to which the robots move during coordinated ex- 
ploration. 

The algorithm first computes value function Vg, one for each of the robots 
(lines 2 through 10). However, these value functions are defined differently 
from the ones encountered thus far: The minimum value is attained at the 
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robot pose. The further out a cell, the higher its value. Figure 17.9 and 17.10 
show several examples of such value functions. In each case, the location of 
the robot possesses minimum value, with the value increasing throughout 
the reachable space. 

It is easy to show that these value functions measure for any possible grid 
cell the cost of moving there. For each individual robot, the optimal frontier 
cell to explore is now computed via line 13: it is the minimum cost cell in Vp 
which is yet unexplored, according to a binary gain map computed in line 
11. This cell is used as the goal point. However, to “discourage” other robots 
from using the same or a nearby goal location, our algorithm resets the gain 
map to zero in the vicinity of the chosen goal location. This takes place in 
lines 14 through 16. 

The coordination mechanism in multi_robot_exploration can be summa- 
rized as follows: Each robot greedily picks the best available exploration goal 
point, and then prohibits other robots from selecting the same or a nearby 
point. Figures 17.9 and 17.10 illustrate the effect of this coordination. Both 
robots in Figure 17.9, although located at different places, identify the same 
frontier cell for exploration. Thus, when exploring without coordination, 
they would both aim for the same area to explore. This is different in Fig- 
ure 17.10. Here the first robot makes its choice, out-ruling this location for 
the second robot. In turn, the second robot selects a superior location. The 
resulting joint exploration action avoids this conflict and, as a result, explores 
more efficiently. 

Clearly, the coordination mechanism is quite simplistic, and it is easily 
trapped in a local minimum. What would have happened if in Figure 17.10 
we would have let the second robot choose first? Then the first robot would 
have been forced to select a far-away target location, and both robots’ paths 
would have crossed along the way. Obviously, a good criterion for a subop- 
timal assignment is that paths cross. However, the absence of such a path 
crossing does not guarantee an optimal assignment. 

Improved coordination techniques take such conflicts into consideration, 
and enable robots to trade goal points with each other. A popular family 
of techniques enables robots to swap assignments of goal points with each 
other, if this reduces the overall exploration costs. Such algorithms are often 
phrased in terms of auction mechanisms. The resulting algorithms are often 
characterized as market-based algorithms. 

Figure 17.11 shows an application of algorithm multi_robot_exploration 
in a real-world experiment. Here three robots are exploring an unknown 
environment. The leftmost image shows all the robots at their starting lo- 
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(a) (b) 




















Figure 17.9 Two robots exploring an environment. Without any coordination both 
vehicles would decide to approach the same target location. Each image shows the 
robot, the map, and its value function. The black rectangle indicates the target points 
with minimum cost. 





(a) (b) 




















Figure 17.10 Target positions obtained using the coordination approach. In this case 
the target point for the second robot is to the left in the corridor. 


cations. The other images depict different situations during the coordinated 
exploration. A map constructed by the same robots in an additional run is 
depicted in Figure 17.12. As can be seen, the robots in fact were nicely dis- 
tributed over the environment. 

Figure 17.13 depicts the performance of this algorithm compared to a team 
of robots from which all robots apply Algorithm Monte Carlo exploration 
without any coordination. Whereas the horizontal axis depicts the number of 
robots in the team, the vertical axis shows the number of time steps needed 
to complete the exploration task. In these experiments it was assumed that 
the robots always share their local maps. It was furthermore assumed that 
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Figure 17.11 Coordinated exploration by a team of mobile robots. The robots dis- 
tribute themselves throughout the environment. 





Figure 17.12 Map of a 62 x 43m? large environment learned by three robots in 8 
minutes. 
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Figure 17.13 Exploration time obtained in a simulation experiment in which robot 
teams of different sizes explore the environment shown in the left image. 
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Start robot A 


Start robot B 


Robot B decides to 
verify hypothesis 


Figure 17.14 Coordinated exploration from unknown start locations. The robots 
establish a common frame of reference by estimating and verifying their relative loca- 
tions using a rendezvous approach. Once they meet, they share a map and coordinate 
their exploration. Courtesy of Jonathan Ko and Benson Limketkai. 


all robots started close to each other. The result is quite telling. Obviously, 
the uncoordinated robots are significantly less efficient than the coordinated 
team. 

The coordination strategies discussed so far assume that the robots share a 
map and know their relative locations. By considering the uncertainty in the 
relative robot locations, multi robot exploration can be extended to the case 
when the robots start from different, unknown locations. 

Figure 17.14 shows an exploration run using such an extended coordina- 
tion technique. The two robots, A and B, have no knowledge about their rel- 
ative start locations. Initially, the robots explore independently of each other. 
As they explore, each robot estimates the other robot's location relative to 
its own map, using a modified version of MCL localization. When deciding 
where to move next, both A and B consider whether it is “better” to move to 
an unexplored area, or to verify a hypothesis for the other robot's location. 
At one point, B decides to verify a hypothesis for A’s location. It sends A a 
message to stop and moves to A’s hypothesized location (marked as meet- 
ing point in Figure 17.14). Upon reaching this location, both robots check 
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the presence of the other robot using their laser range-finders (robots are 
tagged with highly reflective tape). When they detect each other, their maps 
are merged and the robots share a common frame of reference. From then 
on, they explore the environment using algorithm multi_robot_exploration. 
Such an exploration technique from unknown start locations can be applied 
to scenarios involving more than two robots. 


Exploration for SLAM 


The final algorithm in this book applies the greedy exploration idea to a full 
SLAM algorithm. In the previous sections we always assumed that either the 
map or the pose of the robot was known. In SLAM, however, we do not know 
either. Accordingly, we have to consider the uncertainty in the map as well 
as the uncertainty in the robot’s pose when choosing how to explore, and 
we can gain or lose knowledge about both. Obviously, without knowledge 
about the pose of the vehicle, the integration of sensor information into the 
map can lead to serious errors. On the other hand, a robot that solely focuses 
on reducing the uncertainty about its pose will simply not move, and thus it 
will never acquire any information about the environment beyond its initial 
sensor radius. 


Entropy Decomposition in SLAM 


The key insights for optimal exploration in SLAM is that the entropy of the 
SLAM posterior can be decomposed into two terms: One pertaining to the en- 
tropy in the pose posterior, and one pertaining to the expected entropy of the 
map. In this way, an exploring SLAM robot trades off the uncertainty in the 
robot pose with the uncertainty in the map. Control actions tend to reduce 
only one of both: when closing a loop, a robot will mainly reduce its pose 
uncertainty. When moving into open unexplored terrain, it will mainly re- 
duce its map uncertainty. By considering both, whatever reduction is larger 
will win, and the robot may sometimes move into open terrain, sometimes 
re-localize by moving back into known terrain. 

The decomposition of the entropy is in fact universal for the full SLAM 
problem. Consider the full SLAM posterior. 


p(zia,m | Zits Urt) 
This posterior can be factored as follows: 


p(zia,m | Zit, Ult) = D(Z1: | Zit, U1:t) p(m | 21:45, Z1:t Ut) 
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(17.20) 
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This is trivial, and it was already stated in Equation (13.2) on page 442. It is 
less obvious that this implies 


A[p(t14,m | 21:5 03:t)] 


= Hlip(zia | z162143)] + E [H[p(m | £1:t, 21:2, 91:)]] 


where the expectation is taken over the posterior probability p(x1 | 
2145 U1:)- 

Writing p(r,m) as an abbreviation for the full posterior p(zi4,m | 
214, U14), We can derive the additive decomposition from first principles: 


H(z,m) = Ex[-logp(z,m)] 
= Ex ml- log(p(zx) p(m | z))] 
= Ex m[—log p(x) — log p(m | z)] 
= Ez m|—logp(x)] + Ex,m[— log p(m | z)] 
= Byl-logp(a))+ [ —p(2,m)logp(m | 2) dz dm 


= E,[-logp(x)] + J E E log p(m | 2) de dm 





= Ej|-logp(z)] + l p(z) / Kai E E pan 


= H(x)+ / p(z) H(m | £) da 
= H(x) + E,H(m |2) 





This transformation directly implies the decomposition in (17.19). It proves 
that the SLAM entropy is the sum of the path entropy and the expected en- 
tropy of the map. 


Exploration in FastSLAM 


The entropy decomposition is now leveraged into an actual SLAM explo- 
ration algorithm. Our approach is based on the FastSLAM algorithm de- 
scribed in Chapter 13 of this book, and in particular the grid-based Fast- 
SLAM algorithm described in Chapter 13.10. Recall that FastSLAM repre- 
sents the SLAM posterior by a set of particles. Each particle contains a robot 
path. In the case of the grid-based implementation, each particle also con- 
tains an occupancy grid map. This makes the entropy measures for occu- 
pancy grid maps applicable, as were just discussed in the previous section. 


»js.on 000000 





17. Exploration for SLAM 595 
1: Algorithm FastSLAM_exploration(Y;): 

2: initialize h = oo 

3: repeat 

4: propose an exploration control sequence Wi+l:T 

5: select a random particle y, € Y; 

6: forr =t+1toT 

7 draw x, ~ p(z, | uz, z4—1) 

8: draw z, ~ p(z, | £+) 

9: compute Y, = FastSLAM(z,, u7, Y, 1) 

10: endfor 

11: fit a Gaussian Hz, 3, to all pose particles (al) € Yr 
12: h = log det(Xz) 

13: for particles k = 1 to M do 

14: let m be the map mi! from the k-th particle in Yr 
15: update h = h + + Hm] 

16: endfor 

17: if h < h then 

18: seth — h 

19: set 化 二 17 = Ut+1:T 

20: endif 

21: until convergence 

22: return Ut41:7 








Table 17.3 The exploration algorithm for the grid-based version of FastSLAM. It 
accepts as input a set of particles Y;. Each particle mu contains a sampled robot 
path zl and an associated occupancy grid map ml*l. Tt outputs an exploration path, 
expressed in relative motion commands. 
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An algorithm for determining an exploration action sequence is illustrated 
in Table 17.3. This algorithm leaves a number of important implementation 
questions open, hence it shall only serve as a schematic illustration. How- 
ever, it characterizes all main steps of an actual implementation of the SLAM 
exploration ideas. 

The FastSLAM exploration algorithm is essentially a test-and-evaluate al- 
gorithm. It proposes a course of action for exploration. It then evaluates 
these actions by measuring the residual entropy. Building on the fundamen- 
tal insight discussed above, the entropy is computed by adding two terms; 
a term corresponding to the robot pose at the end of the proposed explo- 
ration sequence, and a term pertaining to the expected map entropy. The 
exploration algorithm then selects the controls that minimize the resulting 
entropy. 

In detail, the algorithm FastSLAM. exploration accepts as input a set of 
particles and returns as output a proposed sequence of control for explo- 
ration. Line 4 of Table 17.3 proposes a potential control sequence. The eval- 
uation of a control sequence takes place in lines 5 through 16. It is organized 
in three parts. First, the robot is simulated, based on a random particle in the 
particle set. This simulation uses the stochastic model of the robot and its 
environment. The result is a sequence of particle sets, all the way to the end 
of the control trajectory. This simulation takes place in lines 5 through 9. 

Subsequently, the entropy of the final particle set is calculated. Through 
the mathematical decomposition stated in Equation 17.19, the entropy is fac- 
tored into two terms: A term related to the entropy of the robot pose estimate 
at time T, and a term related to the expected map uncertainty. The first term 
is calculated in lines 11 and 12. Its correctness follows from the calculation of 
the entropy of a Gaussian, which is derived in Table 17.4. 

The second entropy term is computed in lines 13 through 16. Notice that 
the calculation of the second term involves the entropy of a map m. For 
occupancy grid maps, this computation is analogous to the one discussed in 
the previous section. Lines 13 through 16 compute the average entropy of 
the map, where the average is taken over all particles at time T. The result 
is a value h which measures the expected entropy at time T', conditioned on 
the proposed control sequence. Lines 17 through 20 then select the action 
sequence that minimizes this expected entropy, which is ultimately returned 
in line 22. 

Note that this algorithm uses an approximation to compute the entropy of 
the posterior about the trajectories in line 11. Instead of fitting a Gaussian 
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Lemma. The entropy of a multivariate Gaussian of d dimensions and covari- 
ance 2 is given by 


d 1 
H = Pis + log 27) 十 5 log det (£) 
Proof. With 


1 1 
p(t) = (21)- 2 det(X)- 2 exp {—5 a? etal Oo pp 5 uod n} 


= E[-logp(z) 
1 


= 5 (dlog 2m + logdet(X) + Ela? X! a] - 2E[z7] X! p +u” X! y) 


Here E[r7] = u7, and E[zT X^! z] resolve as follows (where ^." denotes 
the dot product) 


E|r?3r*w] = Ext. 
= Ejea] 57! 
= uut- EtL. 
= uE dud 
It follows that 
1 
五 plz] = (dlog 2m + log det(X) + uE tu +d- 2u"E tu + W Eu) 


NIa NI 


1 
= c(1-log27)- 5 log det (X) 











Table 17.4 The entropy of a multivariate Gaussian. 


to all trajectory particles yl | it only computes it based on the last poses al L 


This approximation works well in practice, and it bears resemblance to our 
notion of exploration action approximation. 

In summary, the FastSLAM exploration algorithm is essentially an exten- 
sion of the Monte Carlo exploration algorithm stated in Table 17.1, with two 
insights. First, it is applied to entire sequences of controls, not just a single 
control. Second—and more importantly—the FastSLAM exploration algo- 
rithm computes two types of entropies, one pertaining to the robot path, and 
one to the map. 
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Empirical Characterization 


The exploration algorithm leads to appropriate exploratory behavior, specif- 
ically in cyclic environments. Figure 17.15 shows a situation in which a robot 
explores a cyclic environment containing a loop. The robot started in the lower 
right corner of the loop, labeled “start.” At time step 35, the actions consid- 
ered by the robot lead back to the start, and also the unknown area to the left 
of this marker. The anticipated knowledge gain for moving back to the start 
is higher than moving into unexplored terrain, since in addition to new map 
information the pose uncertainty will be reduced. Thus, the exploring robot 
actively decided to close the loop and move towards previously explored 
terrain. 

Figure 17.16 investigates the cost-benefit trade off closer. Shown there are 
8 different actions as indicated. The utility of action 1 happens to be the 
highest. It specifically outweighs the utility of action 4 (and any other non- 
loop-closing action), which is significantly lower. 

In time step 45, the robot closes the loop, as indicated in Figure 17.15. At 
this point, the pose uncertainty is minimal and the map uncertainty begins to 
dominate. As a result, the loop-closing action becomes unattractive. At time 
88, the robot chooses to explore the open area, to which it then advances as 
visualized in Figure 17.15. 

Figure 17.17 depicts the evolution of the overall entropy during this exper- 
iment over time. Up to time step 30, the reduction of the map uncertainty 
compensates the increase of uncertainty about the robot's trajectory. There- 
fore, the entropy stays more or less constant. Whereas the execution of the 
loop-closing action reduces the entropy in the belief about the trajectory of 
the robot, the change in map uncertainty is relatively small. This leads to a re- 
duction in the overall entropy. As soon as the robot integrates measurements 
covering so far unknown areas in the horizontal corridor, the changes in the 
map and pose uncertainties again compensate each other. The decline of the 
overall entropy around time step 90 is caused by observations in the wider 
parts of the horizontal corridor. This is due to the fact that in occupancy grid 
maps the reduction of the map uncertainty by incorporating a range scan 
generally depends linearly on the size of the unknown area covered by the 
scan. 

The intricate interplay between path and map uncertainty, and the cor- 
responding knowledge gain terms, play an essential role in the exploration 
approach. When appropriate, the robot sometimes prefers localization, and 
sometimes prefers moving into unmapped terrain. 
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start 


Figure 17.15 A mobile robot explores an environment with a loop. The robot starts 
in the lower right corner of the loop. After traversing it, it decides to follow the pre- 
vious trajectory again in order to reduce its uncertainty. Then it continues to explore 
the corridor. Courtesy of Cyrill Stachniss, University of Freiburg. 
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Figure 17.16 In this situation the robot determines the expected utility of possible 
actions. (a) the exploration actions considered by the robot; (b) the expected utility of 
each action. Action 1 is selected because it maximizes the expected utility. Courtesy 
of Cyrill Stachniss, University of Freiburg. 
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Figure17.17 The evolution of the entropy during the exploration experiment shown 
in Figure 17.15. Courtesy of Cyrill Stachniss, University of Freiburg. 


Summary 


In this chapter, we have learned about robot exploration. All algorithms in 
this chapter are motivated through a single aim: to maximize the knowledge 
gained by the robot. By directing control actions so as to maximize the know- 
ledge gain, the robot effectively explores. 


This idea was applied to a number of different exploration problems: 


In active localization, the robot seeks to maximize the knowledge of its 
pose relative to a known map. We devised an algorithm that calculates 
the expected knowledge gain for moving to any relative robot pose. It 
trades off the gain with the minimum costs of moving there. The resulting 
algorithm does a fine job selecting locations that result in high information 
gain. 


In exploration for mapping, the robot knows its pose at all times. Instead, 
it has to gather information about its environment. Building on the occu- 
pancy grid mapping paradigm, we noted that the information gain can be 
computed separately for each grid cell in the map. We compared a num- 
ber of different techniques for computing information gain and noted that 
simple techniques—such as the entropy—do quite well. We then devised 
a dynamic programming algorithm for moving to the nearby point that 
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optimizes the trade-off between information gain and costs of moving 
there. 


* We augmented the knowledge gain exploration technique to the case of 
multiple robots. This extension proved remarkably straightforward. This 
is because an easy modification of the dynamic programming paradigm 
makes it possible to compute the cost of moving to any location, and trade 
it off with the knowledge gain. By comparing costs and knowledge gain 
for different target locations, multiple robots can coordinate their explo- 
ration assignments so as to minimize the overall exploration time. 


* Finally, we discussed exploration techniques for the full SLAM problem, 
in which both the robot pose and the environment map are unknown. 
Our approach observed a fundamental decomposition of the entropy into 
two terms, one pertaining to the path uncertainty, and one to the uncer- 
tainty in the map (averaged over all paths). This insight was leveraged 
into a generate-and-test algorithm for exploration, which generates con- 
trol sequences, computes future estimates, and trades off path uncertainty 
versus map uncertainty when evaluating such control sequences. The 
result was an exploration technique that sometimes leads the robot into 
unknown terrain so as to improve the maps, and sometimes back to pre- 
viously mapped terrain so as to improve the pose estimates. 


Most of the exploration techniques in this chapter were greedy in the sense 
that the robot only considers a single action choice in its exploration decision. 
This greediness is the result of the enormous branching factor in most explo- 
ration problems, which renders multi-step planning intractable. However, 
the choice of the right exploration action required some thought. 

The algorithms in this chapter consider moving to any point in the robot's 
local coordinate system as an exploration action. Thus, the basic action unit 
considered here goes significantly beyond the basic robot control action as 
defined in Chapter 5 of this book. It is this definition of action that makes 
seemingly simple exploration techniques applicable to complex multi-step 
robot exploration problems. 

It was noted that exploration can also be formulated as a general POMDP 
problem, using a payoff function that rewards information gain. However, 
POMDPs are best when the branching factor is small, and the number of 
possible observations is limited. Exploration problems are characterized 
by huge state and observation spaces. Hence they are best solved through 
greedy techniques that directly maximize information gain. 
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Bibliographical Remarks 


Exploration has been a primary application domain in robotics systems development, with ap- 
plications as far reaching as volcano exploration (Bares and Wettergreen 1999; Wettergreen et al. 
1999), planetary/lunar exploration (Gat et al. 1994; Hóllerer et al. 1999; Krotkov et al. 1999; 
Matthies et al. 1995; Li et al. 2000), search and rescue (Murphy 2004), abandoned mine mapping 
(Madhavan et al. 1999; Thrun et al. 2004c; Baker et al. 2004), meteorite search in Antarctica (Urm- 
son et al. 2001; Apostolopoulos et al. 2001), desert exploration (Bapna et al. 1998) and underwater 
exploration (Ballard 1987; Sandwell 1997; Smith and Dunn 1995; Whitcomb 2000). 

The literature on algorithm design for robotic exploration draws its root in the various fields 
of information gathering and decision theory referenced in the two previous chapters. One 
of the early approaches to robot exploration is the algorithm described by Kuipers and Byun 
(1990) and Kuipers and Byun (1991), see also Pierce and Kuipers (1994). In this approach, the 
robot identifies so-called locally distinguishable places that allow it to distinguish between al- 
ready visited and so far unexplored areas. A similar seminal paper is by Dudek et al. (1991), 
who developed an exploring strategy for exploring an unknown graph-like environment. Their 
algorithm does not consider distance metrics and is specifically designed for robots with very 
limited perceptual capabilities. 

An early exploration technique for learning topological maps with mobile robots was pro- 
posed by Koenig and Simmons (1993). The idea of actively exploring for occupancy grid map- 
ping using dynamic programming goes back to Moravec (1988) and Thrun (1993). Tailor and 
Kriegman (1993) described an approach to visit all landmarks in an environment for learning a 
feature-based map. In this system the robot maintains a list of all unvisited landmarks in the en- 
vironment. The idea of information maximization for exploration using a statistical formulation 
can be found in Kaelbling et al. (1996). Yamauchi et al. (1999) introduced the frontier-based ap- 
proach to mobile robot exploration, specifically seeking out the frontiers between the explored 
and the unexplored for directing a robot's actions. More recently, González-Bafios and Latombe 
(2001) proposed an exploration strategy that considers the amount of unseen area that might 
be visible to the robot from possible view points, to determine the next action. Similar explo- 
ration strategies have also become popular in the area of 3-D object modeling. For example, 
Whaite and Ferrie (1997) study the problem of scanning objects and consider the uncertainty in 
the parameters of a model to determine the next best view-point. 

The exploration approach has also been extended to teams of collaboratively exploring 
robots. Burgard et al. (2000), and Simmons et al. (2000a) extended the greedy exploration frame- 
work to robot teams who jointly explore and seek to maximize their map information; see also 
Burgard et al. (2004). This approach is similar to the incremental deployment technique intro- 
duced by Howard et al. (2002) as well as to the algorithm proposed by Stroupe (2004). Market- 
based techniques for coordinated exploration have been investigated by Zlot et al. (2002). Dias 
et al. (2004) analyzed potential failures during multi-robot coordination, and provided an im- 
proved algorithm. An approach to deal with heterogeneous teams has been presented by Singh 
and Fujimura (1993). An extension to coordinated exploration from multiple, unknown start 
locations was introduced by Ko et al. (2003) and tested thoroughly in Konolige et al. (2005). The 
approach uses a structural estimate of the environment along with a modified version of MCL 
localization to estimate the relative robot locations (Fox et al. 2005). In Rekleitis et al. (2001b), 
the authors propose an exploration technique in which one robot observes another exploring, 
thereby reducing its location uncertainty. Some of the multi-robot exploration experiments pre- 
sented in this chapter were first published by Thrun (2001). 
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In some papers, the map exploration problem has been studied as a coverage problem, which 
addresses the problem of algorithm design for exhaustively covering an unknown environment. 
A recent paper by Choset (2001) provides a comprehensive survey into this field. Newer tech- 
niques (Acar and Choset 2002; Acar et al. 2003) have approached this problem from a statistical 
technique, with algorithms not dissimilar from the ones discussed here. 

Within the context of SLAM, several authors have devised exploration techniques that can 
jointly optimize for map coverage and active localization. Makarenko et al. (2002) describe 
an approach that determines the actions to be carried out based on expected information gain 
obtained by re-observing landmarks (to more accurately determine their location or the pose of 
the robot) and by exploring unknown terrain. In a similar vein, Newman et al. (2003) describe an 
exploration approach in the context of the Atlas (Bosse et al. 2003) framework for efficient SLAM. 
Here the robot builds a graph-structure to represent visited areas. Sim et al. (2004) specifically 
addresses the problem of trajectory planning in SLAM. He considers a parameterized class of 
spiral trajectory policies in the context of an EKF-based approach to the SLAM problem. The 
FastSLAM exploration technique described in this chapter is due to Stachniss and Burgard (2003, 
2004). A technique for SLAM exploration whereby a robot drops markers to aid the localization 
problem is described in Batalin and Sukhatme (2003). 

The performance analysis of robotic exploration strategies has also been the subject of consid- 
erable interest. Several authors provided mathematical or empirical analyses of the complexity 
of different exploration strategies (Albers and Henzinger 2000; Deng and Papadimitriou 1998; 
Koenig and Tovey 2003; Lumelsky et al. 1990; Rao et al. 1993). For example, Lee and Recce 
(1997) presented an experimental study in which they compare the performance of different 
exploration strategies for single robots. 

Our technique for active localization in mobile robotics was first published in Burgard et al. 
(1997) and Fox et al. (1998). More recently, Jensfelt and Christensen (2001a) presented a sys- 
tem that uses a mixture of Gaussians to represent the posterior about the pose of the robot 
and describe how to perform active localization given this representation. The problem of ac- 
tive localization has also been studied theoretically, for example, by Kleinberg (1994) under the 
assumptions of perfect sensors. 

Several authors have developed robotic exploration strategies for dynamic environments. 
Of specific interest have been pursuit evasion games, as discussed in the rich literature on dif- 
ferential games (Isaacs 1965; Bardi et al. 1999). Techniques for pursuit evasion in indoor mobile 
robotics are due to LaValle et al. (1997) and Guibas et al. (1999), and have recently been extended 
by Gerkey et al. (2004). 

Finally, exploration has been intensively studied in automata theory. The sequential deci- 
sion making paradigm in which a learner receives payoff as it experiments has originally been 
studied in the context of simple finite state automata known as bandits, see Feldman (1962); 
Berry and Fristedt (1985); Robbins (1952). Techniques for learning the structure of finite state 
automata go back to Rivest and Schapire (1987a,b) and Mozer and Bachrach (1989), who de- 
veloped techniques for generating sequences of tests that distinguish different states in an FSA. 
State-based bounds on the complexity of exploring deterministic environments have been de- 
rived by Koenig and Simmons (1993) and Thrun (1992), which were later extended to stochastic 
environments by Kearns and Singh (2003). 
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17.8 Exercises 


1. Consider a robot that operates in the triangular environment with three 


types of landmarks: 


> p» 


Each location has two different landmarks, each with a different color. 
Let us assume that in every round the robot can only inquire about the 
presence of one landmark type: either the one labeled "r", the one labeled 
“g”, or the one labeled "b". Suppose the robot first fires the detector for 
“b” landmarks and moves clockwise to the next arc. What would be the 
optimal landmark detector to use next? How would the answer change if 


it does not move or if it moved counterclockwise to the next arc? 


. Suppose you are given K omnidirectional robots, which for the sake of 


this question can move and sense in all directions at all times. In this 
question, we would like to see each visible location once; we do not care 
about the benefits of seeing locations more than once. 


The text noted that the use of multiple robots can speed-up exploration 


by more than just a unitary factor (meaning: K robots can be more than 
K times faster than 1 robot). 


(a) How much faster can a team of K robots be when compared to a single 
exploring robot? 


(b) Provide an example environment that maximizes the speed-up for K = 
4 robots, and discuss the exploration strategy that leads to this speed- 


up. 


3. Suppose you are chasing a moving intruder through a known, bounded 


environment. Can you draw an environment where K robots can succeed 
in finding the intruder in finite time, but K — 1 robots cannot? Draw such 
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4. 


5. 


an environment for K = 2, K = 3, and K = 4 robots. Notice that your 
result may make no assumption on the motion strategy of the intruder, 
other than: if she is in your field of view, you see her. 


A very simplistic exploration problem is known as the K-arm bandit prob- 
lem. In it you face a slot machine with K arms. Each arm provides a payoff 
of $1 with px probability, where px is fixed but unknown to you. Your 
task is to select arms to play such that your overall payoff is maximal. 


(a) Prove that the greedy exploration strategy can be suboptimal, where 
“greedy” is defined through the optimal choice of action relative to a 
maximum likelihood estimator of the probabilities p;. (After n plays of 
arm k, the maximum likelihood estimate of p; is given by n; /n, where 
nj is the number of times you received the $1 payoff.) 


(b) Prove that an optimal exploration strategy may never abandon any of 
the arms. 
(c) Implement the K-arm bandit for K = 2, with both probabilities pj and 


po chosen uniformly from the interval [0; 1]. Implement as good an 
exploration strategy as you can find. Your exploration strategy may 
only depend on the variables n; for i = 1,2. Explain your strategy, and 
measure your overall return for 1,000 games, each lasting 100 steps. 


In Chapter 17.4, we empirically compared two different ways of calculat- 
ing the information gain for a grid cell: entropy and expected entropy 
gain. Provide a mathematical bound for the error between these two 
quantities under the assumptions stated in the chapter. For which map 
occupancy value(s) will this error be maximal? For which value(s) will it 
be minimal? 


. In the text we encountered an expression for the entropy of a Gaussian. 


We would like you to compute the expected information gain for a simple 
Gaussian update. Suppose we are estimating an unknown state variable 
x, and our present maximum likelihood estimates are u and X. Let us 
further assume our sensor can measure z, but the measurement will be 
corrupted by Gaussian noise with covariance Q. Provide an expression 
for the expected knowledge gain through taking a sensor measurement. 
Hint: Focus on the covariance, not the mean. 
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anticipated uncertainty, 488 
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CML, see SLAM 

coastal navigation, 8, 558 

coin flip, 14 

combined state vector, 313 

complete state, 21, 24 

complete state assumption, 33 

computer vision, 332 

concurrent mapping and localization, 
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cumulative payoff, 497 
cyclic environment, 282 

in exploration, 598 

in FastSLAM, 471 

in SLAM, 343 


data association, 178 
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algorithm, 59 
EKF localization, 201 
algorithm with known 
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general algorithm, 216, 217 
illustration, 201, 210 
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algorithm with known 
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fundamental dilemma, 330 
general algorithm, 321, 323 
numerical instability, 329 
EM algorithm, 333 
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for learning a sensor model, 165 
for outlier rejection, 269 
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decomposition, 593 
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environment, 19 
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algorithm 
expectation of a random variable, 18 
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experimental design, 542 
exploration, 569 
exploration action, 570, 576, 585 
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Monte Carlo algorithm, 573, 574 
multi-robot, 587 
multi-robot algorithm, 588 
multi step, 575 
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v1.0 algorithm with known 
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v1.0 general algorithm, 460, 461 
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finite-horizon case, 497 
frontier 
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of a search tree, 416 
frontier-based exploration, 584 
full SLAM problem, 310 


Gaussian, 14 
as a posterior, 40 
canonical parameterization, 71 
conditioning lemma, 359 
for a particle filter, 104 
marginalization lemma, 358 
mixture, 63 
multivariate, 15 
noise, in SLAM, 312 
Gaussian filter, 39 
generative model, 17 
global correspondence problem, 424 
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gradient descent, 376 
in learning, 296 
graphical SLAM, 382 
GraphSLAM, 337 
algorithm, 346, 347, 365 
correspondence test algorithm, 364 
with unknown correspondence, 
363 
greedy case, 497 
grid-based FastSLAM, 474 
grid localization, 238 
algorithm, 238, 239 
illustration, 245 


heading direction, 119 
heaven-or-hell problem, 510 
hidden Markov model, 25, 87 
histogram filter, 86 
dynamic decomposition, 92 
for localization, 238 
selective updating, 93 
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static decomposition, 92 
histogram in a particle filter, 105 
holonomic robot, 147 
horizon, 497 
hybrid control, 12 
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importance factor 
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importance sampling 

in a particle filter, 99 
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conditional, 17 
infinite-horizon case, 498 
information filter, 71 

algorithm, 73 
information gain, 572 
information gathering, 488 
information matrix, 71 
information state, 26 

in a POMDP, 494 
information vector, 71 
injection of random particles, 256 
innovation, 43, 214 
inverse measurement model, 95, 294 
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inversion lemma, 50 


Jacobian, 58 
joint distribution, 15 
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algorithm, 42, 43 
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Kalman gain, 43, 52 
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kernel density estimation, 105 


5ris.cn 000000 


Index 


kidnapped robot problem, 194 
kinematics, 118 
kinematic state, 20 
KLD-sampling, 263 

algorithm, 264 

comparison, 265 
knowledge state, 26 
Kullback-Leibler divergence, 263 
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landmark existence probability, 329 
landmark model 

algorithm, 178, 179 

sampling algorithm, 179, 180 
laser range scan, 150 
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of a measurement model, 158 
least squares 

in SLAM, 337 
Levenberg Marquardt, 376 
likelihood field 

algorithm, 171, 172 

for range finders, 169 
likelihood test for correspondence, 
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linear Gaussian SLAM, 382 
linear Gaussian system, 40 

measurement probability, 42 

state transition, 41 
linearization, 56 

in GraphSLAM, 356 
localization 

comparison chart, 273 

in a dynamic environment, 267 

Markov, 197 

of a mobile robot, 5 

taxonomy, 193 
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with an EKF, 201 
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local submaps, 369 
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of a robot, 119 
location-based map, 152 
logistic regression, 294 
log odds ratio, 94 
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loop closure, 471 
loopy belief propagation, 79 
low variance sampling, 109 
algorithm, 109, 110 


Mahalanobis distance, 72 
map, 152 

feature-based, 152 

location-based, 152 

metric, 93 

of an environment, 152 

topological, 93 

volumetric, 152 
map-based motion model, 140 
map management 

in FastSLAM, 459 

in SLAM, 328 
map matching, 174, 233, 275 
MAP occupancy grid map, 299 
MAP occupancy grid mapping 

algorithm, 301, 302 

mapping with known poses, 283 
marginalization lemma, 358 
marginal of a Gaussian, 358 
market-based algorithms, 589 
Markov assumption, 33 
Markov blanket, 419 
Markov chain, 21 
Markov localization, 197 
algorithm, 197 
illustration, 5, 200 
Markov random field, 79 
matrix inversion lemma, 50 
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maximum likelihood correspondence 
in SLAM, 323 
maximum of linear functions, 530 
MC-POMDP, 559 
algorithm, 559, 560 
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algorithm, 252 
algorithm with adaptive sample 
set, 264 
augmented algorithm, 257, 258 
with adaptive sample set, 263 
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262 
with random particles, 256 
MDP, 491 
measurement, 22 
measurement arcs, 337 
measurement innovation, 214 
measurement likelihood, 208 
measurement model, 149 
correlation-based, 174 
feature-based, 176 
for a range finder, 153 
measurement noise, 154, 169 
measurement probability, 25 
in a linear Gaussian, 42 
measurement update, 26 
in a Bayes filter, 27 
in an information filter, 75 
metric map, 93, 241 
Mixture MCL, 262 
mixture of Gaussians, 63 
mixture weight in an MHT, 219 
model-based paradigm, 11 
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moments parameterization, 40 
Monte Carlo exploration, 573 
algorithm, 573, 574 
Monte Carlo localization, see MCL 
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motion model, 119 
algorithm with maps, 141 
map-based, 140 
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odometry algorithm, 134 
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velocity algorithm, 121, 123, 129 
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multi-hypothesis tracking, 218 
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exploration, 587 
exploration algorithm, 588 
localization, 196 
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multi-sensor fusion, 293 
multi-step exploration, 575 
multivariate distribution, 15 
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natural parameterization, 40 
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in a SEIF, 421 
in FastSLAM, 459 
in GraphSLAM, 370 
in localization, 231 
neural network, 294, 296 
nonlinear motion model, 54 
nonlinear state transition, 54 
nonparametric filter, 85 
normal distribution, 14, 128 
algorithm for computing, 123 
algorithm for sampling, 124 
normalizer in Bayes rule, 17 
numerical instability of EKF SLAM, 
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occupancy grid map, 94, 281 

algorithm, 286 
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odds of an event, 94 
odometer, 23 
odometry algorithm 

sampling algorithm, 136 
odometry model 

algorithm, 134, 135 

sampling algorithm, 137 
odometry motion model, 132 
online SLAM problem, 309 
outlier rejection 

in localization, 229 

in MCL, 269 

in SLAM, 328 
overconfidence in perception, 183 


partially observable Markov decision 
process, see POMDP 
partial observability, 488 
particle 
in a particle filter, 97 
in FastSLAM, 444 
particle deprivation, 112 
particle filter, 96 
algorithm, 98 
Gaussian approximation, 104 
histogram approximation, 105 
in localization, 250 
in SLAM, 437 
low variance sampling, 109 
particle deprivation, 112 
stratified sampling, 111 
passive localization, 195 
payoff function, 495 
in a POMDP, 516 
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photogrammetry, 332 
piecewise linear function, 514 
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point-based value iteration, 538 
policy, 496 
policy for action selection, 493 
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approximate algorithms, 547 

numerical example, 515 
pose, 20 
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position estimation, 191 
position tracking, 193, 197 
positive information, 313 
posterior probability, 16 
potential field, 11 
pre-caching of a measurement model, 

167, 243 
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prediction, 26 
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prior probability, 16 

in a binary Bayes filter, 96 
proactive SLAM, 339 
probabilistic robotics, 4 
probability density function, 14 
proposal distribution 
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pruning, 220 
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random variables, 14 

range and bearing sensor, 177 
range finder algorithm, 158 
Rao-Blackwellized particle filter, 437 
ray casting, 154 

relative motion information, 133 
relaxation algorithm, 390 
resampling, 99 
resource-adaptive algorithms, 86 
robot environment, 19 

robot learning problem, 9 
rotational velocity, 121 
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from a normal distribution, 124 
from a probability density function, 
122 
from a triangular distribution, 124 
sampling algorithm 
landmark model, 179, 180 
model model, 141 
normal distribution, 124 
odometry model, 136, 137 
triangular distribution, 124 
velocity motion model, 122, 124 
scan matching 
for localization, 234 
Schur complement, 359 
SEIF, 385 
algorithm with known 
correspondence, 391, 392 
correspondence test, 416 
correspondence test algorithm, 418 
map fusion algorithm, 425, 426 
selective updating 
in localization, 244 
of a histogram filter, 93 
semi Markov decision process, 276 
sensor failure, 156, 171 
sensor measurement, 22 
sensor resetting in localization, 276 
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Shepard’s interpolation, 561 
Sherman/ Morrison formula, 50 
sigma point, 65 
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mapping, see SLAM 
single-robot localization, 196 
situated agent, 12 
SLAM, 309 
exploration, 593 
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soft data association constraints, 415 
sonar range scan, 149 
sparse extended information filter, see 
SEIF 
sparseness 
in SEIF, 387 
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sparsification, 390 
in SEIF, 400 
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specular reflection, 150 
state 
augmented, 550 
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dynamic, 20 
hybrid, 21 
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kinematic, 20 
of an environment, 20 
of knowledge, 26 
static, 20 
transition probability, 25 
state augmentation, 269 
state transition 
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static environment, 194 
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stochastic action effects, 488 
stratified sampling, 111 

structure from motion, 334, 381 
sufficient statistics, 181 

supervised learning algorithm, 294 


target distribution 
in a particle filter, 100 
in FastSLAM, 447 
taxonomy of localization problems, 
193 
Taylor expansion, 57 
textbooks on mobile robotics, 145 
Theorem of total probability, 16 
thin junction filter, 434 
tiger problem, 544 
topological map, 93, 239 
track, 220 
training example, 295 
translational velocity, 121 
triangular distribution, 128 
algorithm for computing, 123 
algorithm for sampling, 124 
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algorithm, 70 
UKF localization 
algorithm, 221 
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undo a data association, 363, 369 
unexplainable measurements, 157, 
171 
universal plan, 493 
unscented Kalman filter, 65 
unscented transform, 65 
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value function, 499 

value iteration 
algorithm for an MDP 501, 502 
algorithm for a POMDP, 527, 529 
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velocity motion model, 121 
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